Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
Forest Aboveground Biomass Estimation Using Machine Learning Ensembles: Active Learning Strategies for Model Transfer and Field Sampling Reduction
Next Article in Special Issue
Identification and Analysis of Long-Term Land Use and Planting Structure Dynamics in the Lower Yellow River Basin
Previous Article in Journal
Semi-Supervised Urban Change Detection Using Multi-Modal Sentinel-1 SAR and Sentinel-2 MSI Data
Previous Article in Special Issue
R-Unet: A Deep Learning Model for Rice Extraction in Rio Grande do Sul, Brazil
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison and Assessment of Different Land Cover Datasets on the Cropland in Northeast China

1
Key Laboratory of Watershed Geographic Sciences, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, China
2
School of Geography, Geomatics and Planning, Jiangsu Normal University, 101 Shanghai Road, Tongshan District, Xuzhou 221116, China
3
UCASNJ, University of Chinese Academy of Sciences, Nanjing 211135, China
4
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(21), 5134; https://doi.org/10.3390/rs15215134
Submission received: 14 August 2023 / Revised: 20 October 2023 / Accepted: 25 October 2023 / Published: 27 October 2023
(This article belongs to the Special Issue State-of-the-Art in Land Cover Classification and Mapping)
Figure 1
<p>Geographical location and topography map of Northeast China.</p> ">
Figure 2
<p>The workflow of this research. OA = overall accuracy; PA = producer’s accuracy; UA = use’s accuracy; MCC = the Matthews correlation coefficient. The pink font shows the national-scale land cover datasets, and the others are all at global scale.</p> ">
Figure 3
<p>Distribution of validation samples across the Northeast in Phases 2000, 2010, 2015, and 2020. The graduated red points represent true non-cropland, and the graduated green points indicate real cropland.</p> ">
Figure 4
<p>Comparisons of the spatial accuracy indexes of the datasets in 2000, 2010, 2015, and 2020.</p> ">
Figure 5
<p>The commission and omission error distribution of nine data points in 2015.</p> ">
Figure 5 Cont.
<p>The commission and omission error distribution of nine data points in 2015.</p> ">
Figure 6
<p>The commission and omission error distribution of six data points in 2020. The disagreements relate to the distribution of verified points and the corresponding position in each dataset. In <a href="#remotesensing-15-05134-f005" class="html-fig">Figure 5</a> and <a href="#remotesensing-15-05134-f006" class="html-fig">Figure 6</a><b>,</b> the ginger-pink dots represent that true pixels of non-cropland are classified as cropland (representing a commission error). The spruce-green dots indicate that true pixels of cropland are classified as non-cropland (representing an omission error).</p> ">
Figure 7
<p>Demonstration of the cropland magnified performances of four regions in different datasets. Dark-green indicates cropland and white indicates non-cropland. The positions of (<b>A</b>–<b>D</b>) are shown in <a href="#remotesensing-15-05134-f008" class="html-fig">Figure 8</a>b.</p> ">
Figure 8
<p>Illustration of the spatial agreement level and cropland area curves of meridional and zonal of nine datasets in 2015. Charts (<b>a</b>,<b>c</b>) are the area curves of the meridional and zonal datasets, respectively. The overlapping result of datasets is shown in subplot (<b>b</b>), representing the resampled evaluated data at a resolution of 30 m, and the digital numbers indicate different consistency levels. The stacked bar chart is the cropland area proportion (%) at the agreement levels.</p> ">
Figure 9
<p>Illustration of the spatial agreement level and cropland area curves of meridional and zonal of six datasets in 2020. Charts (<b>a</b>,<b>c</b>) are the area curves of the meridional and zonal datasets, respectively. The overlapping result of datasets is shown in subplot (<b>b</b>), representing the resampled evaluated data at a resolution of 30 m, and the digital numbers indicate different consistency levels. The stacked bar chart is the cropland area proportion (%) at the agreement levels.</p> ">
Figure 10
<p>Scatterplots between the CLCD and CGLS-LC100, CLUDs, Esri, GLC_FCS30, and GlobeLand30. The axes represent the cropland area aggregation within the grid cell of 8.438 km × 9.537 km across Northeast China of six datasets, which only includes the comparison of cropland area. The blue dots represent the cropland area value aggregation within a grid cell. The black dotted line represents the 1:1 auxiliary line, while the red solid line depicts the data fitting curve. The unit of RMSE is km<sup>2</sup>.</p> ">
Figure 10 Cont.
<p>Scatterplots between the CLCD and CGLS-LC100, CLUDs, Esri, GLC_FCS30, and GlobeLand30. The axes represent the cropland area aggregation within the grid cell of 8.438 km × 9.537 km across Northeast China of six datasets, which only includes the comparison of cropland area. The blue dots represent the cropland area value aggregation within a grid cell. The black dotted line represents the 1:1 auxiliary line, while the red solid line depicts the data fitting curve. The unit of RMSE is km<sup>2</sup>.</p> ">
Figure 11
<p>Comparison of the cropland area of all examined datasets with the statistical results by Yu et al. [<a href="#B56-remotesensing-15-05134" class="html-bibr">56</a>].</p> ">
Figure 12
<p>Scatterplots of the prefecture-level city reconstructed cropland area vs. the aggregated area of cropland from each dataset. The blue + symbol represent the cropland area value aggregation within a prefecture-level city. The black dotted line represents the 1:1 auxiliary line, while the red dashed line depicts the data fitting curve. The unit of RMSE is km<sup>2</sup>.</p> ">
Figure 13
<p>Comparison of the overall accuracy of different datasets grouping by data resolution, producing time, sensor, and classification algorithm, with different individual color markers for different land cover datasets.</p> ">
Figure 14
<p>Study case showing the comparison of cropland identification in the southeast of Chagan Lake with five datasets in Phase-2020. Dark green indicates cropland, and white indicates non-cropland.</p> ">
Figure 15
<p>The proportion of cropland and non-cropland in mosaic cropland in GlobCover-2005, GlobCover-2009, CCI-LC-2000, and CCI-LC-2010.</p> ">
Review Reports Versions Notes

Abstract

:
The provision of precise and dependable information regarding the extent and distribution of cropland is imperative for the evaluation of food security, agricultural planning, and resource management. Cropland is an important component of land cover type and is offered in multiple existing global/regional land cover products. However, global-scale accuracy evaluation may not be representative of class-specific or local-area accuracy, such as in Northeast China, which is an important grain-producing region of China and has various types of cultivated land (e.g., wheat, rice) and diverse terrains. It poses a great challenge in generating precise cropland classification by automated mapping. Thus, it is indispensable to evaluate the accuracy and reliability of these various land cover datasets before using them. In this study, we collected thirteen sets of global or national-scale land cover datasets. Through the visual interpretation of high-resolution images, ground “truth” samples were collected to evaluate the data accuracy across Northeast China. The overall accuracy (OA) evaluation results in Phase-2020 show that CLCD has the highest value with 0.914, followed by GlobeLand30 (0.906), GLC_FCS30 (0.902), and Esri (0.896) for cropland classification in Northeast China. CGLS-LC100 has the lowest OA (0.710). For the commission and omission errors of six datasets in Phase-2020, CGLS-LC100 has an obvious overestimation (larger commission error), while the two national-scale datasets (CLCD and CLUDs) perform relatively better. In terms of spatial consistency, high spatial agreement among the nine Phase-2015 datasets or in the six Phase-2020 datasets could be discovered in traditional agricultural regions like the Sanjiang–Songnen–Liaohe Plain, and low agreement is found in the transition areas of mountains (hills) and plains with the mixed landscape of forest (grassland) and farmland. In the aspect of comparison pairwise data, CLCD is in good agreement with GLC_FCS30, GlobeLand30, and Esri, while CGLS-LC100 is in the poorest agreement with any other dataset. The comparison and evaluation results are expected to provide a reference on which aspects and to what extent these land cover products may be consistent and guide the cropland data product selection for Northeast China.

1. Introduction

Agricultural land meets global demands such as human food, stock feed, and biofuel. These demands are growing rapidly due to the increasing population and consumption [1,2]. Therefore, understanding the range, distribution, and dynamics of cropland change is required for food demands and environmental sustainability, especially for agricultural planning, cropland monitoring, and food security assessment, among others. An increasing number of global and regional land use/cover (LULC) datasets [3], including the farmland category, have been produced by the researchers and are freely available to the public with the rapid development of remote sensing and new technologies (i.e., cloud computing, big data, and machine learning). Coarse resolution images (i.e., pixels of 5 km or 1 km) were generally used in the early period [4,5,6]; however, their application at national or regional levels is limited by the low precision and image quality. Then, the spatial resolution of the products was enhanced to 500 m or 300 m [7,8]. Nowadays, land cover mapping using high-resolution images, such as 30 m or 10 m resolution satellite data [9,10], has received increasing attention from the scientific community. All these datasets offer precious cropland information for multi-discipline research and applications. However, there are considerable inconsistencies and uncertainties among these LULC datasets, especially in areas with complex transition landscapes, since they have been generated using different satellites/sensors, classification features, classifiers, methods, etc. [11,12]. A few previous studies have also shown that discrepancies and uncertainties in cropland also arise due to cropland definitions or different classification schemes [13,14,15]. Therefore, the accuracy assessment and comparative studies of these LULC datasets are necessary for relevant application research before using them [16,17].
In the existing studies, many researchers have performed accuracy evaluations for various LULC datasets. For example, multiple prior studies have compared and evaluated the spatial (in)consistency of global land cover products from various perspectives, e.g., at the same spatial resolution (i.e., 1 km or 30 m) [18,19,20,21], for certain types of satellite sensors [22], on a global scale [20,21,23], local regions (like China or one basin) [24,25,26,27,28], or the global and continental scales by elevation and climate partitions [29]. Others have compared and summarized an uncertainty analysis of global land cover datasets using an error budget approach [14]. In addition, a few researchers compared a certain element or land cover theme (such as grassland, wetland, etc.) in some land cover datasets to investigate their comparative advantages [30,31,32]. The prior evaluation work provides important references for choosing land cover products and understanding the spatial characteristics of LULC on global and national scales. However, findings from the global/national evaluations of the product’s accuracy for all LULC elements (classes) may not be replaceable for local-area or class-specific accuracy. For example, a total of 159,874 validation samples were selected by the producers for the assessment of GlobeLand30-2010. The product’s overall accuracy was estimated to be 80.3% [9]. Regional or country-wide accuracy evaluations of GlobeLand30 have also been evaluated by many researchers and reported different assessment results. For instance, a number of 1467 sample points were used by Meng et al. [33] to verify the performance of GlobeLand30-2010 in Shaanxi Province, which indicates an overall accuracy of 80.0%. The overall accuracy of GlobeLand30-2010 performed by Ma et al. is 81.5% for the study area in Henan province [34]. Another assessment based on 8400 verification sample pixels by Wang et al. [35] indicates that the overall accuracy of GlobeLand30-2010 data in China is 84.2%. Regional accuracy evaluations of GlobeLand30 were also being carried out in other countries, and their evaluation results were also different. Overall accuracies were evaluated by Kussul et al. [36] at about 89.7% for Ukraine, while assessments in Iran by Jokar Arsanjani et al. [37] and in Thessaly by Manakos et al. [38] indicated overall accuracies of 77.9% and 86.0%, respectively. A summary of the comparison of the regionally achieved accuracies from different land cover datasets vs. the claimed global accuracies typically reported by global land cover datasets can be found in Supplementary Table S1.
Northeast China is an important production base for grain and other agricultural products and is one of the three main grain-producing regions in China, which provides the highest grain food production and the highest grain commodity rate to the country. Thus, it plays a critical supporting role in ensuring national food security. Moreover, in Northeast China, the different types of croplands and the diversity of terrains and landscapes have led to an obvious transitional zone of cropland distribution. Under these complex conditions, the remote sensing extraction of cropland is very challenging. At present, cropland information can be obtained from various remote sensing data. However, due to the different purposes, study areas, evaluation methods, and other reasons, the results on the accuracy of the evaluation of global/national land cover datasets are inconsistent and incongruous. Thus, it is essential to investigate the accuracy of different land cover datasets over Northeast China to better depict its cropland distribution.
The purpose of this study is to examine and contrast the farmland category of distinct land cover datasets with varying resolutions and satellite data sources over Northeast China. The evaluation proceeds from the following three aspects: (1) Approximately 2000 “ground-truth” samples for each epoch (the years around 2000, 2010, 2015, and 2020) were manually interpreted across Northeast China to verify the dataset accuracies. (2) The spatial agreement among these datasets was compared and analyzed at different scales, including provincial, longitudinal, and latitudinal dimensions. (3) The possible causes for the uncertainties and discrepancies in the accuracies among these evaluated datasets were eventually discussed. Motivated by these targets, this paper conducts a relatively comprehensive comparison analysis and accuracy evaluation of various datasets collected across Northeast China, which is hoped to further understand the accuracies and abilities of these datasets to represent the cropland area in Northeast China, provide an important reference for data users, and improve the cropland mapping and advanced applications in the future.

2. Materials and Methods

2.1. Study Area

Northeast China is a major agricultural region (115°52′–135°09′E, 38°72′–53°55′N), including the provinces Heilongjiang, Jilin, and Liaoning, and several prefecture-level cities in the east of Inner Mongolia Autonomous Region (Hulunbuir, Xing’an League, Tongliao, and Chifeng). With an altitude ranging from 0 to 2665 m, Northeast China has various types of landforms, including the mountains of the Greater Khingan Mountains, the lesser Khingan Mountains, and the Changbai Mountains; the eastern Liaoning and the western Liaoning; Horqin Sandy Land; the Sanjiang–Songnen–Liaohe Plains; the Songhua River and the Nenjiang River; wetlands of various sizes; and transition zones of these mega-landforms, which leads to their complicated natural landscape conditions. Figure 1 shows its geographic location and topographic distribution.
From east to west, the territory is divided into temperate, warm temperate, and cold temperate zones, with monsoon and continental climate types. The elevation over the study area increases significantly from the hinterland to the outer, and the terrain changes are mainly shaped by the surrounding mountains and drainage network. All these factors contribute to the complex and diversified landforms of the region. Cropland is concentrated in the Sanjiang–Songnen–Liaohe Plain and is mosaic with forest, grassland, wetland, and other land cover types. Therefore, Northeast China is a typical representative of the cropland classification evaluation.

2.2. Datasets

2.2.1. Global/Regional Land Cover Datasets

Global/regional land cover datasets are important information sources for understanding the complex interactions between human activities and global change. In this study, we focus on comparing and assessing the cropland classification performance of land cover data products on a global scale. Additionally, in order to confirm whether the classification with more input samples of land cover types at local scales can enhance the data accuracy, we added two other national-scale datasets in China. This study totally collected eleven global-scale datasets, among which the GLC2000, FAO-GLCshare-2014, FROM-GLC-2017, and Esri-2020 products only contain one-year data layers; the GLASS-GLC, GLCNMO, CCI-LC, GlobCover, CGLS-LC100, GlobeLand30, and GLC_FCS30 contain multi-year data layers. The two national-scale data products are CLUDs and CLCD. CLUDs has a 5-year interval from 1990 to 2020, while CLCD has annual data layers from 1990 to 2019. The investigated land cover datasets are briefly described below. In addition, we have sorted out and summarized the satellites used in data production, the spatial resolution, and the classification techniques and algorithms used for land cover types. Key information about the thirteen datasets is shown in Table 1.
(1)
The Global Land Cover 2000 (GLC2000) dataset was developed by the European Commission’s Joint Research Center with a spatial resolution of 1 km. The GLC2000 legend is classified into 22 classes using unsupervised clustering, and its overall accuracy is 68.6% [39].
(2)
FAO-GLCshare (the Global Land Cover-share created by FAO (Food and Agriculture Organization)), produced by the United Nations’ (UN) FAO in 2014 [40]. It uses the data fusion method to integrate the available national, regional, and global datasets with a resolution of 1 km. The product encompasses 11 classes and has an accuracy of 80%.
(3)
The Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC) was produced using the data of the Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) [10]. These data represent the land cover around 2017 with a spatial resolution of 30 m and encompasses ten land-cover classes.
(4)
Another global land cover map, which we named Esri in this paper, is 10 m resolution imagery derived from ESA Sentinel-2. It was generated using a deep learning model that used more than 5 billion Sentinel-2 pixels and was sampled from more than 20,000 sites. These sites are found in all major biomes in the world [41,42].
(5)
The Global Land Surface Satellite-Global Land Cover (GLASS-GLC) is an annual dynamic record of global land cover products from 1982 to 2015. It was generated on the Google Earth Engine (GEE) platform with the latest version of GLASS (the Global Land Surface Satellite) CDRs (Climate Data Records) [43]. It has a resolution of 5 km and an average overall accuracy of 82.81%.
(6)
The National Mapping Organizations (GLCNMO) Global Land Cover was generated by the International Steering Committee for Global Mapping. Version I (2003) has a spatial resolution of 1 km, and version II (2008) and version III (2013) datasets [44,45] have an improved spatial resolution at 500 m [46].
(7)
The Land Cover (LC) project of the Climate Change Initiative (CCI) by the European Space Agency (ESA) provides a series of annual datasets with 300 m resolution from 1992 to 2015, termed the CCI-LC dataset [47]. The product uses unsupervised spatio-temporal clustering and machine learning classification methods, with a total of 22 land cover classes.
(8)
The GlobCover global land cover map was developed by the ESA in collaboration with an international network of partners [8,48,49]. This dataset contains 22 classes defined by the UN LCCS.
(9)
The Copernicus Global Land Service Land Cover at the 100 m resolution (CGLS-LC100)-collection 3 was released by the Copernicus Global Land Service. It was derived from high-quality land cover training sites based on PROBA-V satellite observations and multiple auxiliary datasets [50]. This global LULC map contains 23 classes.
(10)
The GlobeLand30 was a 30-m global land cover product generated by the National Geomatics Center of China using the Pixel-Object-Knowledge operational method [9,51]. It has an overall classification accuracy of more than 80% worldwide [52]. It comprises 10 land cover classes.
(11)
The global 30 m land cover dataset with a fine classification system (GLC_FCS30) in 2015 was produced using a time series image of Landsat and high-quality training data from the Global Spatial Temporal Spectra Library (GSPECLib) on the GEE platform. Then the GLC_FCS30-2020 was generated with the prior knowledge of experts and the multi-source auxiliary datasets. Both of them contain 30 land-cover classes [21].
(12)
China’s land-use/cover datasets (CLUDs) were provided by the Resource and Environment Science and Data Center. Its resolution is 1 km, and it documented in detail China’s land cover in the 1980s, 1990, 1995, 2000, 2005, 2010, 2015, and 2020. It includes 6 level-1 classes, which are cropland, grassland, forest, built-up area, water, and barren, and 25 level-2 classes [53,54].
(13)
The annual China Land Cover Dataset (CLCD) was derived from Landsat imagery on the GEE platform by Yang et al. [55], which contains annual land cover data layers at 30 m spatial resolution in China from 1990 to 2019. Including 9 land cover types: cropland, forests, shrubs, grasslands, water, snow/ice, barren, impervious, and wetlands, its overall accuracy is reported at 79.31%.
According to the time characteristics of the mapping data sources, we classified all the data into four phases: 2000, 2010, 2015, and 2020. Phase-2000 comprises eight datasets; Phase-2010 also includes eight datasets; Phase-2015 predominantly consists of nine datasets; and Phase-2020 comprises six datasets. Table 2 displays the data contained in these four periods.

2.2.2. Other Auxiliary Dataset

We also collected the auxiliary data on cropland distribution in Northeast China reconstructed by Yu et al. [56] as statistical data. This dataset was reconstructed by a multi-source data fusion method. By coordinating the data accuracy, time, and spatial resolution of diverse data sources, the continuous annual cropland distribution dataset of China was reconstructed using the spatialized cropland distribution approach. Four statistical datasets were used in this process, including the Land and Resources Statistical Yearbook, the National Land and Resources Bulletin, the Chinese Agriculture Yearbook, and the national crop production from the Chinese Statistical Yearbook, as well as multiple remote sensing products, including global cropland data provided by the Global Food Security Analysis-Support Data, GlobeLand30, the China Land Use and Cover Change, FROM-GLC, and so on. Overall, Yu et al. [56] assimilated the trend information and spatial pattern of various satellite datasets to produce a series of cropland percentage maps that are spatially explicit and span the time period from 1900 to 2016. Through intensive comparisons, the authors are confident that these data provide reliable sources of cropland maps [56].

2.3. Data Processing Procedures

The flowchart of this study is shown in Figure 2. The following sections describe the key data processing steps. Three key steps are involved in the accuracy assessment and product comparison. First, a random sampling approach was employed to obtain validation sample data. This process resulted in four-phase validation samples (about 2000 samples for each) in 2000, 2010, 2015, and 2020, respectively. These validation sample points were put into the Google Earth and World Imagery Wayback platforms, and interpreted by referring to the high-resolution images. Here, the images of Phase-2000 and Phase-2010 are provided by Google Earth Pro, and the images of Phase-2015 and Phase-2020 are from September 20th, 2015, and 2020 of the World Imagery Wayback platform. Those sample points were marked according to the actual situation of the ground (marked 1 represents cropland, 0 represents non-cropland). Second, the function “Extract Multi Values to Points” was used in ArcGIS to assign the cropland and non-cropland of each data point to the attributes of the sample point. Third, we constructed the confusion matrices of all datasets in the four years; that is, the number of products and interpretive sample points simultaneously identified as cropland/cropland, cropland/non-cropland, non-cropland/cropland, and non-cropland/non-cropland were counted (the confusion matrix is shown in Supplementary Data). The accuracy evaluation indexes were calculated according to the formula, and a comparison chart of the accuracy evaluation indexes of each data point was drawn.
For the spatial agreement comparison, in order to facilitate the comparison of these different land cover products (different coordinate systems and resolution), we first processed the global/national land cover datasets for the study area by using the boundary of Northeast China. Secondly, we carried out reprojection, resampling, and reclassification processing successively in ArcGIS. The datasets were projected using the Albers Equal-Area projection. Using the nearest neighbor method, all data were resampled to 30 m. We combined all cropland subclasses of each dataset into one class to integrate cropland information. Here, the mosaic cropland (>50%) with other elements in some data (e.g., GLCNMO, CCI-LC, and GlobCover) are classified into cropland, and the classification of cropland subcategories is shown in Supplementary Table S5. After this processing, the datasets were only reclassified into cropland and non-cropland. Finally, the “Raster Calculator” was used to obtain the agreement levels of consistency and discrepancy of cropland in the same spatial position of different land cover datasets, and the area proportion of different agreement levels was calculated. In addition to the evaluation of data accuracy in the statistical metrics and spatial agreement, we also grouped the data products by their spatial resolutions, production periods, data sensor sources, and similar classification algorithms (Figure 2). It is targeted to explore the influencing factors that may affect the accuracy of various datasets in cropland mapping.
In the comparison of cropland area in Section 3.4, we used pixel counting to calculate the cropland area of Northeast China under the projection of Albers, which is the pixel number of all cropland areas multiplied by the single pixel area.

2.4. Methodology

2.4.1. Accuracy Assessment Metrics

First, the accuracy assessment is performed by the error matrix or confusion, which is a crosstab of various land cover products and the ground truth sample point. It directly reflects the overall accuracy of the mapping [57,58,59]. Using the confusion matrix, four classical evaluation parameters are calculated, which are the user’s accuracy (UA) (commission error), producer’s accuracy (PA) (omission error), overall accuracy (OA), and the Matthews correlation coefficient (MCC), respectively. The UA, PA, and OA, respectively, refer to the percentages of the products classified correctly, the validation points classified correctly, and the correctly classified number. The Matthews correlation coefficient generates reliable results independently of the ratios of positive and negative elements in the binary classification predictions [60].
The four evaluation metrics are the most commonly used methods in the accuracy assessment of land cover datasets [11,61,62,63]. They are defined as follows:
User s   accuracy = X ii X i + × 100 %
Producer s   accuracy = X ii X + i × 100 %
Overall   accuracy = i 2 X ii N 2 × 100 %
MCC = TP × TN     FP × FN TP + FP TP + FN TN + FP TN + FN
where X ii is the amount of class i pixels that were correctly classified; in this study, it refers to the amount of cropland and non-cropland that are correctly classified. N refers to the total number of the examined pixels; X i + represents the number of class i pixels in the classification result, which is the total number of pixels of cropland in various land cover datasets; and X + i , on behalf of the class i pixel number for the reference data, is the total pixel number of cropland in the validation sample. TP (true positive) indicates the reference positive is correctly classified as positive; TN (true negative) indicates the reference negative is correctly classified as negative; FP (false positive) represents the reference negative is wrongly classified as positive; and FN (false negative) represents the reference positive is wrongly classified as negative.

2.4.2. Inter-Comparison Method

The comparative assessments of various land cover datasets include the location and quantity of each class [64]. In this study, 13 sets of LULC datasets were compared and analyzed in Northeast China from the two aspects of spatial location and cropland quantity. The comparison of locations is usually a matter of spatial (dis)agreement, pixel by pixel. The purpose of spatial agreement analysis is to explore the characteristics of the spatial distribution of cropland in various datasets. Spatial agreements include visual comparison and pixel-by-pixel comparison. Visual comparison can show the spatial variation of the datasets among regions subjectively. Per-pixel comparison is to calculate the proportion of pixels with the same attributes in the total pixels of various datasets in the same spatial location. The comparison of quantity mainly refers to the area comparison of cropland. We calculate the cropland area of Northeast China before projection and resampling.

2.4.3. Pairwise Data and Prefecture-Level City Cropland Area Validation

Correlation analysis can verify the relationship between the pairwise data and the ability to represent the prefecture-level cropland area. The following Formulas (5)–(7) are used for correlation analysis to detect the goodness of fit between datasets:
x i = x i y i
RMSE = i 1 n x i 2 n
r 2 = ( i 1 n x i x ¯ y i y ¯ ) 2 i 1 n x i x ¯ 2 i 1 n ( y i     y ¯ ) 2
where ∆xi is the difference between the cropland area (xi) of a data point and another data point (yi), and n is the total number of regions of aggregation. x ¯ and y ¯ are the average cropland areas.

3. Results

3.1. Accuracy Evaluation of the Thirteen Datasets in the Four Phases

Figure 3 shows the distribution of sample points across Northeast China for four years. The distribution of ground-truth points in four periods is roughly similar in Figure 3, and the majority of cropland is distributed in the Sanjiang–Songnen–Liaohe Plain, with the main types of non-cropland being mountains, hills, lakes and rivers, grasslands, sandy, and bare land, such as the Greater Khingan Mountains, the Changbai Mountains, and so on. Scattered cropland distribution is found in the transition areas of plains and hills, beside lakes and rivers, or in areas such as grassland or bare land. The difference between cropland and non-cropland in the four phases was mainly concentrated in these areas.
Based on the ground truth validation samples, we constructed the confusion matrices of all datasets in four phases and obtained the spatial accuracy measures (i.e., producer’s accuracy, user’s accuracy, overall accuracy, and the Matthews correlation coefficient), which were computed from the confusion matrices (Figure 4). In Phase-2020, the average overall accuracy of six datasets is the highest at 0.86, followed by Phase-2000, Phase-2010, and Phase-2015, all around 0.83. CLUDs and CLCD data are available in all four phases, while GLASS and CCI-LC data are available in three phases: 2000, 2010, and 2015. The following is a comparison of the data evaluation metrics by phase.
The overall accuracies of the eight datasets in Phase-2000 all exceed 0.73. GlobeLand30 has the highest OA of 0.90, followed by CLCD (0.89) and CCI-LC (0.87). From the perspective of PA, ClobCover-2005 ranks the lowest with an accuracy of 0.57, followed by GLC2000 (0.66). As for UA, ClobCover-2005 also has the lowest value of 0.62, followed by GLCNMO-2003 (0.63). Overall, GlobeLand30 has a balanced performance, with higher OA, UA, and PA in 2000, followed by CLCD, and their MCC are all over 0.75. In Phase-2010, GlobCover-2005 has the lowest UA, PA, and OA, followed by GlobCover-2009. CLCD has the highest UA, PA, and OA, followed by ClobeLand30. Other datasets have medium accuracy. The overall accuracies in the nine datasets of Phase-2015 are over 0.76; CLCD has the highest OA value of 0.90, followed by CCI-LC (0.87) and GLC_FCS30 (0.85). From the perspective of PA, GLASS ranks the lowest with 0.68. As for UA, GLCNMO-2013 has the lowest value of 0.63, followed by CGLS-LC100 (0.64) and FAO-GLCshare (0.68). In general, CLCD has the highest OA, UA, and PA in Phase-2015; its MCC is over 0.77. Following it, CCI-LC and GLC_FCS30 also have good performances, with MCC greater than 0.70. The overall accuracies in the six datasets of Phase-2020 are more than 0.70; the OA of CLCD is 0.914, followed by GlobeLand30 (0.906), GLC_FCS30 (0.902), and Esri (0.896). In terms of PA, CLUDs ranks the lowest with 0.69. This corresponds to a higher omission error, indicating that the estimation of cropland area across Northeast China is low (Figure 5 and Figure 6). As for UA, CGLS-LC100 has the lowest value of 0.57, and it is clear to observe that the cropland area in CGLS-LC100 data is overestimated compared to other data (Figure 5 and Figure 6). CLCD has the best performance in Phase-2020, followed by GlobeLand30, and all of their MCC are greater than 0.79. (Figure 4, Figure 5 and Figure 6). Following them are GLC_FCS30 and Esri, both of which have MCCs greater than 0.77. For detailed numerical information, please see Supplementary Table S2.
In summary, CLCD has the highest accuracy in four phases, followed by GlobeLand30, GLC_FCS30, and Esri. The low PAs values of GlobCover, GLC2000, CLUDs, GLASS, and GLCNMO reflect a low estimation of the area in different regions of Northeast China. The UAs of CGLS-LC100, GlobCover, GLCNMO, and FAO-GLCshare are all relatively low, all of which have different degrees of commission errors of cropland in different regions of the Northeast, and the cropland area is relatively overestimated compared to other datasets.

3.2. Commission and Omission Error Analysis on Cropland in Northeast China

Phase-2015 and Phase-2020 contain higher precisions and more products, covering almost all data products except for GLC2000 and GlobCover; thus, we chose the two phases to further analyze the omission and commission errors of these datasets. Based on ground verification points, the omission and commission error distribution of datasets in 2015 and 2020 across Northeast China is illustrated in Figure 5 and Figure 6. The ginger-pink points indicate the higher commission errors, and the spruce-green points represent the higher omission issues.
Many ginger-pink points exist in CGLS-LC100 in Figure 5 and Figure 6, indicating serious commission issues and illustrating the overestimation of cropland area in both meridional and zonal dimensions (Figures 8 and 9). Followed by GLCNMO and FAO-GLCshare, they are suffering from commission issues in different regions. GLCNMO-2013 has serious commission errors in western Inner Mongolia, especially in the western part of Hulunbuir City, where the overestimation is more serious, followed by Heihe City in Heilongjiang Province. Chifeng City in Inner Mongolia has a lower omission phenomenon. FAO-GLCshare has high misclassification in the north of Heihe City in Heilongjiang, the central part of Hulunbuir City, as well as the junction of mountains and plains (the Greater Khingan Mountains, the Changbai Mountains, etc.), while it has low leakage in the Greater Khingan Mountains. GLASS has many spruce-green points, and it has less cropland area than other datasets. CLUDs have omission issues in the Sanjiang–Songnen–Liaohe Plain and Chifeng of Inner Mongolia, with many spruce-green points. It is clear that CGLS-LC100 and GLCNMO-2013 have lower omission errors with fewer spruce-green points, but they suffer from a little more commission issue. CLUDs have higher omission errors with more spruce-green points (Figure 5 and Figure 6). Furthermore, CLCD has the fewest spruce-green and ginger-pink points, indicating that it has few omission and commission errors, which corresponds to the highest accuracy evaluation in Figure 4. The numbers of commission and omission errors in each product in phases 2015 and 2020 are shown in Supplementary Table S3.
To better understand the spatial accuracies of these datasets in more detail, we selected four regions for Phase-2015 (as it contains the most data products) for zoom-in exemplification to better observe the varied accuracies, as shown in Figure 7. It can be seen that except for GLASS, the other eight datasets could capture the distribution of cropland in the Sanjiang and Songnen Plains. GLASS not only has low resolution, but also suffers from omission and commission issues. In the southeast of Hulunbuir city in Inner Mongolia (Figure 7B), FROM-GLC and GLASS have a large number of missing and fail to detect the cropland, followed by FAO-GLCshare and CLUDs. CLCD and GLC_FCS30 have a good description of mountains and farmlands, and CGLS-LC100 has a significantly overestimated cropland area. Five datasets perform well in the Liaohe Plain except for FAO, GLASS, GLCNMO, and CLUDs, which suffer from a commission issue (Figure 7D).

3.3. Spatial Agreement and Discrepancies

The spatial agreement is shown in Figure 8 and Figure 9, which illustrate the spatial similarities and differences of cropland. In terms of the longitudinal cropland distribution curve, most croplands in all datasets are clustered between 125°E and 130°E. In summary, the region with agreement level nine or six (highest spatial consistency) is mainly distributed in traditional agricultural regions of Northeast China, such as the Sanjiang–Songnen–Liaohe Plain, which includes Harbin, Hegang, Jixi, Jiamusi, Qiqihar, and Suihua in Heilongjiang; Liaoyuan City, Siping City, Song Yuan City, Changchun City in Jilin Province; and Fuxin City, Jinzhou City, Shenyang City, and Tieling City in Liaoning Province. All of these zones have homogeneous and clustered croplands and can be easily captured by remote sensing imagery. The regions of lower spatial consistency (agreement level ≤ 2) are distributed in mountainous, hilly zones and transition areas between them in the east of Inner Mongolia, Jilin, and Liaoning, and the north of Heilongjiang.
The overlapping results of nine datasets in Phase-2015 are shown in Figure 8a–c; the area identified as cropland by the nine datasets accounts for 19.7% of the total cropland area. From the meridional area curve, we can see that CGLS-LC100 has the highest area curve between 121°E and 128°E, and FAO-GLCshare has the highest area curve between 128°E and 135°E compared with other datasets. FROM-2017 at around 121–125°E, CLUDs at about 126–129°E, and GLASS at 130–132°E are the lowest. From the perspective of zonal area distribution, FAO-GLCshare, CGLS-LC100, and GLCNMO-2013 have the highest area curves in the range of about 40°N to 42°N, from 42°N to 48°N, and from 48°N to 51°N.
Figure 9a–c shows the overlapping results of six datasets in Phase-2020. The stacked bar shows that 28.5% are simultaneously labeled as cropland in the total identified cropland regions, and 42.2% of the data are identified as cropland at agreement level five or higher. According to the distribution of meridional cropland area, the area curve of CGLS-LC100 is the highest in all longitudes; this indicates that CGLS-LC100 has a high estimation value for cropland area. CLUDs rank lowest between 123°E and 129°E, revealing low area estimation within this longitude range. On the zonal distribution curve of the cropland area, CGLS-LC100 also shows the highest regional distribution among the six curves. Esri has the lowest area, around 45°N, corresponding to the omission phenomenon in Songyuan and Changchun cities of Jilin Province. Supplementary Table S4 of the Supplementary Data shows the area proportion of different consistency levels.
Because the overall accuracy of six data points in Phase-2020 is higher, we further explored their agreement relationships with each other. We compared the pairwise data of six data points, and their scatterplots are shown in Figure 10. CLCD has the highest agreement with GLC_FCS30 (r2 = 0.96), followed by GlobeLand30 with CLUDs (r2 = 0.89), and GlobeLand30 and CLCD (r2 = 0.88); all their RMSEs are less than 12 km2. CLUDs have the lowest agreement with CGLS_LC100, with the lowest r2 of 0.38 and the highest RMSE of 36.15 km2. CGLS-LC100 is in poor agreement with any data, and we can see that almost all points are above or below the 1:1 line. This is because the cropland area in the CGLS-LC100 data is relatively overestimated compared to other data.

3.4. Comparative Analysis by Referring to the Statistical Data

Figure 11 compares the cropland area from collected datasets to statistical data in Northeast China. CLUDs-1980s slightly underestimate cropland area, whereas CGLS-LC100 and GLCNMO-2013 significantly overestimate cropland area in Northeast China. Compared with the reconstructed cropland data, the cropland area value of CLUDs, GlobCover, and GlobeLand30 products is relatively closer to the statistics. The areas of GLASS, FAO-GLCshare, CCI-LC, CLCD, GLC_FCS30, FROM-GLC, and Esri are concentrated around 6 × 105 km2. The three-year areas of GLCNMO-2003, 2008, and 2013 are significantly different, and all products in the statistics of cropland area increase with time. On the whole, the cropland areas based on remote sensing are higher than the statistical data. The differences may be caused by the inconsistent spatial resolution of remote sensing products and official statistics. The remote sensing mapping probably tends to induce biases in the mixed pixels, while the official yearbook statistical data were generated by field surveying, multi-level estimates, and censuses.
Based on the Phase-2015 datasets, we explored the correlation between the datasets and the reconstructed data on the cropland area aggregated at the prefecture level (Figure 12). We found that the r2 of CLUDs is the highest (0.97) and RMSE is the lowest (1.63 km2), followed by CLCD (0.95, 4.04 km2), and CCI-LC (0.96, 4.10 km2). CGLS-LC100 has the lowest r2 of 0.53 and the highest RMSE of 15.96 km2. The fitting effect of FROM-GLC-2017 is also better. Compared with other datasets, CLUDs performs best in regional accuracy (prefecture level). FROM-GLC-2017 and GLASS also have good manifestations, with most of the “plus” points centralized around the 1:1 line. On the contrary, CGLS-LC100 and GLCNMO perform poorly, and they suffer from an obvious overestimation of prefecture-level cropland area.

3.5. Potential Influencing Factors in Data Accuracy

Multiple factors might have resulted in the differences and uncertainties in the cropland range among these datasets. We grouped and compared the overall accuracy of these datasets mainly from four dimensions, including the data resolution, producing time, satellite sensors, and classification algorithms. The results are illustrated in Figure 13. It can be found that the overall accuracy rises with the improvement of resolution and the development of mapping time in general. For example, the GLASS data at 5 km resolution has the lowest overall accuracy, and Esri data at 10 m resolution is among the data lists with the highest overall accuracy. In general, the overall accuracy of the products derived from Sentinel and Landsat observations is better than the data produced by other satellite sensors, while the data produced by the PROBA satellite are the worst. In addition, different classification strategies and mapping methods adopted by these datasets may lead to differences in classification accuracy. The average overall accuracy of the data produced by random forest classification methods, pixel–object–knowledge classification approaches, and deep learning are all over 0.89, followed by the operational SPECLib-based approach and random forest models (0.88), unsupervised spatio-temporal clustering and machine learning (0.87), and extraction of remote sensing information (0.84). The comparison of GlobeLand30, GLC_FCS30, and CLCD at 30-m resolution in Phase 2020, all derived from the same Landsat satellite, reveals that the overall accuracy of CLCD data generated by the Random Forest algorithm surpasses that of Globeland30, achieved by the Pixel-Object-Knowledge classification approach. Both approaches outperform GLC_FCS30, which is generated using an operational SPECLib-based approach.

4. Discussion

4.1. Other Possible Influencing Factors of Dataset Performance

The diverse definitions and classifications of cropland in these evaluated datasets can also introduce differences in classification performance. For example, in GLC2000, CGLS-LC100, and Esri, the croplands are “pure” cropland, including rainfed croplands and flooded crops such as cereal, corn, wheat, and rice. The croplands in GLASS, GlobeLand30, CLCD, GLC_FCS30, and FROM-GLC mainly refer to rainfed/irrigated cropland, orchards, and temporally bare farmland, etc. However, in GLCNMO, the cropland includes herbaceous crop(s), the paddy field of graminoid crops and non-graminoid crops, and the mosaic of cropland and other vegetation. In CCI-LC, the cropland incorporates rainfed/irrigated cropland, herbaceous vegetation, tree or shrub cover, and the mosaic of cropland and natural vegetation (Supplementary Table S5). In general, the mixture of herbaceous or shrubs and cropland and the narrow definition inevitably result in some overestimations and a little underestimation of cropland area, respectively. As shown in Figure 14, the cropland definitions of GLC2000, CGLS-LC100, and Esri are “pure” cropland, and thus their estimation of cropland area should be somewhat underestimated. However, we found that the cropland area of CGLS-LC100 is very high, while GLC2000 and Esri are relatively normal. It could be attributed to some other reasons (we suspect it may be because the cropland in CGLS-LC100 is an independent thematic cropland file rather than a single land cover layer). In addition to that, the cropland of GLASS, GLC2000, CLUDs, GlobeLand30, CLCD, and GLCFCS30 contains several different subcategories, while FAO-GLCshare and CGLS-LC100 are thematic files of cropland, which only consider cropland and non-cropland. We found that both FAO and CGLS-LC100 have high cropland areas.
Additionally, there are mountains, hills, plains, rivers, wetlands, and transition zones in Northeast China. This complicated landscape condition is also one of the reasons leading to the differences and uncertainties among these datasets. It should be stated that, in this work, our validation samples simply evaluated the Northeast China cropland, and our conclusions on the accuracy evaluation of these datasets may not be representative of other land cover classes or other regions’ accuracy assessments. More representative truth points need to be collected for quantitative implementation in other regions in the future.

4.2. Uncertainties Existing in Our Evaluation Work

One potential source of error in our accuracy assessment is the use of validation samples from the same phase year to evaluate datasets from different years, particularly when land cover changes occur in specific regions. In this study, validation samples of four phase years were generated through the visual interpretation of high-resolution images. For example, samples collected in the same phase year of 2010 can be used for Phase-2010, GLCNMO-2008, GlobCover-2009, and GlobCover-2005. Using validation samples from the same phase year (2010) to evaluate datasets from different years (2008/2009/2005) may be problematic if land cover changes occur in some regions, reducing the reliability of commission/omission errors. However, this situation occurred primarily in a few datasets, including CLCNMO-2003 and GlobCover-2005 in Phase-2000, CLCNMO-2008, GlobCover-2005, and GlobCover-2009 in Phase-2010; FAO-GLCshare-2014 and FROM-GLC-2017 in Phase-2015; and CLCD-2019 in Phase-2020. While the land cover type in some traditional cropland regions will not change in the short term, some changes will occur at the intersection of land cover types, such as cropland and grassland, shrubs, or bare land. The mismatch between the period of these verification samples and the data leads to commission/omission errors, and the extent of this error cannot be quantifiable, which is a shortcoming of our research.
Another issue that caused the evaluation uncertainty is the cropland subcategory mosaic. In terms of the mosaic cropland found in some land cover datasets, the cropland mosaic, which accounts for more than 50%, is directly divided into cropland in the data processing section. This issue is mainly related to three products: GLCNMO, CCI-LC, and GlobCover. We calculated the proportion of mosaic cropland pixels in the whole cropland pixel. Figure 15 shows the proportion of mosaic cropland and the proportion of cropland and non-cropland in mosaic cropland of the GlobCover and CCI-LC. Mosaic cropland accounted for 9.6% in GCLNMO-2003, 4.3% in GCLNMO-2008, and 14.5% in GCLNMO-2013. In CCI-LC, the proportion of these mosaics was 7.2% in CCI-LC-2000, 7.0% in CCI-LC-2010, and 7.0% in CCI-LC-2015, respectively. However, in GlobCover, this proportion is relatively large, with 35.5% in GlobCover-2005 and 39.8% in GlobCover-2009. We looked up the definition of the mosaic class in GlobCover, which is “mosaic cropland (50–70%)/vegetation (grassland/shrubland/forest) (20–50%)”. Therefore, in 35.5% or 39.8% of cropland mosaic pixels, the proportion of non-cropland is pretty small (e.g., 8.0–19.9% in GlobCover-2009). However, there is no doubt that some omissions and commission errors are also caused by these cropland mosaic pixels.

4.3. Comparison with the Prior Assessment Work

In the recent study by Zhang et al. [65], a comparative analysis and accuracy assessment of six products with 30 m resolution in China in 2015 was conducted, where GLAD and GFSAD are thematic cropland data. Four accuracy indexes were used, including the overall accuracy, commission error, omission error, and Matthews correlation coefficient. We collected 13 sets of data for our study and conducted a comparative analysis and assessment of the cropland in Northeast China for the four phases around the years 2000, 2010, 2015, and 2020. During the evaluation process, we recognized that the years are not uniform, the data resolutions are different, and all data are land cover products and not thematic cropland products. The accuracy evaluation indexes we used include overall accuracy, producer’s accuracy, user’s accuracy, and the Matthews correlation coefficient. Through our methodology, we aimed to provide valuable insights into the cropland changes in Northeast China and contribute to the scientific community’s understanding of land use dynamics.
Both studies compare and analyze the spatial consistency of data at regional (China, Northeast China), provincial, meridional, and zonal scales. However, Zhang et al.’s study goes into more detail on provincial agreements and differences, not only calculating the proportion of cropland area at different agreement levels in each province but also the relative area difference between each data point and the statistics for each province [65]. An important difference between the two papers is that in the Section 4, Zhang et al. found the relationship between elevation and slope and concluded that the area proportion with high agreement decreases with an increase in elevation and slope [65]. When the elevation and slope increase, the data difference is obvious. Our study did not investigate either aspect, which should be strengthened in the next work.

5. Conclusions

As more and more global or national land cover datasets are published and available, it is becoming easier to obtain cropland information from these open-access products. However, the accuracy evaluation results of these datasets on a global/national scale for all LULC elements (classes) may not be replaceable for local-area or class-specific accuracy. Northeast China is a major agricultural region of China and has complicated landscapes and varied terrain. This paper takes Northeast China to evaluate the performance of our collected thirteen sets of land cover datasets in accuracy, spatial location, and area of croplands. The similarities and differences are shown in a multi-scale comparison between datasets.
Accuracy validation based on ground truth samples indicates that, in general, compared with other datasets, CLCD, GlobeLand30, GLC_FCS30, and Esri have the best performance in cropland classification with overall accuracies of more than 0.89. On the contrary, the OAs of CGLS-LC100-2019 and GlobCover are lower than 0.75. For commission and omission errors, CGLS-LC100 has a severe overestimation, while GLASS suffers from a significant underestimation of cropland area, and the bias situation of CLCD is the least. In the aspect of spatial consistency and difference, the traditional agricultural regions such as the Sanjiang–Songnen–Liaohe Plain in Northeast China have the highest spatial consistency. Lower consistency regions are distributed in mountainous, hilly regions and the transition areas between them. For the scatter plots between pairwise data, the agreement between CLCD and GLC_FCS30 is the highest (r2 = 0.96), followed by GlobeLand30 with CLUDs (0.89) and GlobeLand30 with CLCD (0.88); all their RMSEs are less than 12 km2. CGLS-LC100 is in poor agreement with any data, with r2 lower than 0.56 and RMSE higher than 30 km2.
In exploring the influence factors of the difference and uncertainties in the cropland range and location among all datasets, although the purpose of these datasets was to provide accurate cultivating information, factors such as the definitions of cropland, the landscape characteristics of Northeast China, a classification scheme, and different satellite sensors jointly produced different mapping results. It is arbitrary to say that this dataset is more suitable for this particular application. However, with the increasing demand for high-precision cropland datasets, users should consider the accuracy of cropland classes when selecting a dataset. The accuracy evaluations and comprehensive comparison analysis of this work are expected to provide a meaningful and valuable reference for those (e.g., cropland surveyors) who use these datasets in Northeast China. In addition, the detailed evaluation of this work could provide feedback to data generators, facilitate the improvement of data processing algorithms and classification techniques, and provide better service with future cropland applications and mapping or cropland data fusion.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs15215134/s1. Table S1: The summary of the comparison of the claimed global accuracies in global land cover datasets vs the regionally achieved accuracies from different land cover datasets; Table S2: The precision index values of the products in four phases derived from confusion matrix; Table S3: Numbers of the commission and omission error in each product in Phase-2015 and Phase-2020; Table S4: The area proportion of different agreement levels; Table S5: The Definition and classification of cropland for each data in this article; Table S6: The following is the confusion matrix of each data in the four periods.

Author Contributions

Conceptualization, C.S. and T.C.; methodology, P.C., T.C. and K.L.; software, P.C. and D.Z.; validation, P.C., T.C., D.Z. and K.L.; formal analysis, P.C., T.C., C.S., Y.L. and K.L.; investigation, D.Z. and P.C.; resources, P.C. and D.Z.; data curation, P.C., T.C. and D.Z.; writing—original draft preparation, C.S., T.C., P.C., Y.L. and D.Z.; writing—supervision, C.S., T.C., Y.L. and K.L.; visualization, P.C., T.C., Y.L. and D.Z.; project administration, C.S. and T.C.; funding acquisition, C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the Strategic Priority Research Program of the Chinese Academy of Sciences, China (XDA28020503, XDA28020500), the Jiangsu Normal University Postgraduate Research & Practice Innovation Program (2021XKT0094), the National Key Research and Development Program of China (2019YFA0607101), and the National Natural Science Foundation of China (41971403, 41930102, 42171421).

Data Availability Statement

GLC-2000 can be downloaded from the web at https://forobs.jrc.ec.europa.eu/products/glc2000/products.php (accessed on 1 January 2021). FAO-GLCshare is available at the FAO Map Catalog (accessed on 1 January 2021). FROM-GLC is accessible on the Tsinghua website at Finer Resolution Observation and Monitoring—Global Land Cover (tsinghua.edu.cn). Esri can be downloaded from the web at https://www.arcgis.com/apps/instant/media/index.html?appid=fc92d38533d440078f17678ebc20e8e2 (accessed on 1 January 2021). GLASS-GLC is available at https://doi.org/10.1594/PANGAEA.913496 (accessed on 1 January 2021). GLCNMO is available at https://github.com/globalmaps/gm_lc_v1 (accessed on 1 January 2021). CCI-LC is available at http://maps.elie.ucl.ac.be/CCI/viewer/download.php (accessed on 1 January 2021). The GlobCover can be downloaded from the European Space Agency data user element at http://due.esrin.esa.int/page_globcover.php (accessed on 1 January 2021). CGLS-LC100 can be downloaded from the web at https://zenodo.org/record/3939038#.Y3TzSMdBxBM (accessed on 1 January 2021). GlobeLand30 is accessible online at https://www.webmap.cn/commres.do?method=globeIndex (accessed on 1 January 2021). GLC_FCS30 data are available at https://zenodo.org/record/3986872#.Y3Ty2sdBxBN (accessed on 1 January 2021). CLUDs are available on request from the corresponding author. CLCD is freely available at https://doi.org/10.5281/zenodo.4417810 (accessed on 1 January 2021). The auxiliary datasets for China cropland are available at https://doi.org/10.6084/m9.figshare.13356680.v1 (accessed on 1 January 2021).

Acknowledgments

We would like to thank the editor and the anonymous reviewer, whose constructive comments will help to improve the presentation of this paper.

Conflicts of Interest

The authors declare that they have no known competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.

Abbreviations

GLC2000, Global Land Cover 2000; FAO-GLCshare, The Global Land Cover-share created by FAO (Food and Agriculture Organization, FAO); FROM-GLC, Finer Resolution Observation and Monitoring of Global Land Cover; Esri, a global map of land use/land cover (LULC) derived from ESA Sentinel-2 imagery at 10m resolution; GLASS-GLC, Global Land Surface Satellite-Global Land Cover; GLCNMO, Global Land Cover by National Mapping Organizations; CCI-LC, Climate Change Initiative-Land Cover; GlobCover, Global Land Cover Map; CGLS-LC100, Copernicus Global Land Service-Land Cover 100 m; GlobeLand30, World’s First Global Land Cover Datasets at a 30 m; GLC_FCS30, Global Land Cover product with Fine Classification System at 30 m; CLUDs, China’s Land-use/cover datasets; CLCD, China Land Cover Dataset.

References

  1. Pinstrup-Andersen, P. Perspectives in World Food and Agriculture 2004; John Wiley & Sons: Hoboken, NJ, USA, 2008; pp. 87–97. [Google Scholar]
  2. Gibbs, H.K.; Ruesch, A.S.; Achard, F.; Clayton, M.K.; Holmgren, P.; Ramankutty, N.; Foley, J.A. Tropical forests were the primary sources of new agricultural land in the 1980s and 1990s. Proc. Natl. Acad. Sci. USA 2010, 107, 16732–16737. [Google Scholar] [CrossRef] [PubMed]
  3. Grekousis, G.; Mountrakis, G.; Kavouras, M. An overview of 21 global and 43 regional land-cover mapping products. Int. J. Remote Sens. 2015, 36, 5309–5335. [Google Scholar] [CrossRef]
  4. Loveland, T.R.; Reed, B.C.; Brown, J.F.; Ohlen, D.O.; Zhu, Z.; Yang, L.; Merchant, J.W. Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sens. 2010, 21, 1303–1330. [Google Scholar] [CrossRef]
  5. Hansen, M.C.; Defries, R.S.; Townshend, J.R.G.; Sohlberg, R. Global land cover classification at 1 km spatial resolution using a classification tree approach. Int. J. Remote Sens. 2010, 21, 1331–1364. [Google Scholar] [CrossRef]
  6. Bartholomé, E.; Belward, A. GLC2000: A new approach to global land cover mapping from Earth observation data. Int. J. Remote Sens. 2005, 26, 1959–1977. [Google Scholar] [CrossRef]
  7. Friedl, M.A.; Sulla-Menashe, D.; Tan, B.; Schneider, A.; Ramankutty, N.; Sibley, A.; Huang, X. MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasets. Remote Sens. Environ. 2010, 114, 168–182. [Google Scholar] [CrossRef]
  8. Bicheron, P.; Leroy, M.; Brockmann, C.; Krämer, U.; Miras, B.; Huc, M.; Niño, F.; Defourny, P.; Vancutsem, C.; Arino, O.; et al. Globcover: A 300 m global land cover product for 2005 using ENVISAT MERIS time series. In Proceedings of the Second International Symposium on Recent Advances in Quantitative Remote Sensing, Enschede, The Netherlands, 8–11 May 2006; pp. 538–542. [Google Scholar]
  9. Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M.; et al. Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef]
  10. Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM + data. Int. J. Remote Sens. 2012, 34, 2607–2654. [Google Scholar] [CrossRef]
  11. Herold, M.; Mayaux, P.; Woodcock, C.E.; Baccini, A.; Schmullius, C. Some challenges in global land cover mapping: An assessment of agreement and accuracy in existing 1 km datasets. Remote Sens. Environ. 2008, 112, 2538–2556. [Google Scholar] [CrossRef]
  12. Kaptué Tchuenté, A.T.; Roujean, J.-L.; De Jong, S.M. Comparison and relative quality assessment of the GLC2000, GLOBCOVER, MODIS and ECOCLIMAP land cover data sets at the African continental scale. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 207–219. [Google Scholar] [CrossRef]
  13. Wu, W.; Shibasaki, R.; Yang, P.; Zhou, Q.; Tang, H. Remotely sensed estimation of cropland in China: A comparison of the maps derived from four global land cover datasets. Can. J. Remote Sens. 2014, 34, 467–479. [Google Scholar] [CrossRef]
  14. Congalton, R.; Gu, J.; Yadav, K.; Thenkabail, P.; Ozdogan, M. Global Land Cover Mapping: A Review and Uncertainty Analysis. Remote Sens. 2014, 6, 12070–12093. [Google Scholar] [CrossRef]
  15. Lu, M.; Wu, W.; Zhang, L.; Liao, A.; Peng, S.; Tang, H. A comparative analysis of five global cropland datasets in China. Sci. China Earth Sci. 2016, 59, 2307–2317. [Google Scholar] [CrossRef]
  16. Deines, J.M.; Patel, R.; Liang, S.-Z.; Dado, W.; Lobell, D.B. A million kernels of truth: Insights into scalable satellite maize yield mapping and yield gap analysis from an extensive ground dataset in the US Corn Belt. Remote Sens. Environ. 2021, 253, 112174. [Google Scholar] [CrossRef]
  17. You, N.; Dong, J. Examining earliest identifiable timing of crops using all available Sentinel 1/2 imagery and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2020, 161, 109–123. [Google Scholar] [CrossRef]
  18. Giri, C.; Zhu, Z.; Reed, B. A comparative analysis of the Global Land Cover 2000 and MODIS land cover data sets. Remote Sens. Environ. 2005, 94, 123–132. [Google Scholar] [CrossRef]
  19. McCallum, I.; Obersteiner, M.; Nilsson, S.; Shvidenko, A. A spatial comparison of four satellite derived 1 km global land cover datasets. Int. J. Appl. Earth Obs. Geoinf. 2006, 8, 246–255. [Google Scholar] [CrossRef]
  20. Liu, L.; Zhang, X.; Gao, Y.; Chen, X.; Shuai, X.; Mi, J. Finer-Resolution Mapping of Global Land Cover: Recent Developments, Consistency Analysis, and Prospects. J. Remote Sens. 2021, 2021, 1–38. [Google Scholar] [CrossRef]
  21. Zhang, X.; Liu, L.; Chen, X.; Gao, Y.; Xie, S.; Mi, J. GLC_FCS30: Global land-cover product with fine classification system at 30 m using time-series Landsat imagery. Earth Syst. Sci. Data 2021, 13, 2753–2776. [Google Scholar] [CrossRef]
  22. Hansen, M.C.; Reed, B. A comparison of the IGBP DISCover and University of Maryland 1 km global land cover products. Int. J. Remote Sens. 2010, 21, 1365–1373. [Google Scholar] [CrossRef]
  23. Tsendbazar, N.-E.; de Bruin, S.; Fritz, S.; Herold, M. Spatial Accuracy Assessment and Integration of Global Land Cover Datasets. Remote Sens. 2015, 7, 15804–15821. [Google Scholar] [CrossRef]
  24. Bai, Y.; Feng, M.; Jiang, H.; Wang, J.; Zhu, Y.; Liu, Y. Assessing Consistency of Five Global Land Cover Data Sets in China. Remote Sens. 2014, 6, 8739–8759. [Google Scholar] [CrossRef]
  25. Ning, J.; Zhang, S.; Cai, H.; Bu, K. A Comparative Analysis of the MODIS Land Cover Data Sets and Globcover Land Cover Data Sets in Heilongjiang Basin. J. Geo-Inf. Sci. 2012, 14, 240–249. [Google Scholar] [CrossRef]
  26. Liu, Y.; Zhou, M. Comparative Analysis on Three Land Cover Datasets based on IGBP Classification System over Hanjiang River Basin. Remote Sens. Technol. Appl. 2017, 32, 575–584. [Google Scholar]
  27. Yang, Y.; Xiao, P.; Feng, X.; Li, H.; Chang, X.; Feng, W. Comparison and assessment of large-scale land cover datasets in China and adjacent regions. Natl. Remote Sens. Bull. 2014, 18, 453–475. [Google Scholar] [CrossRef]
  28. Yang, Y.; Xiao, P.; Feng, X.; Li, H. Accuracy assessment of seven global land cover datasets over China. ISPRS J. Photogramm. Remote Sens. 2017, 125, 156–173. [Google Scholar] [CrossRef]
  29. Hua, T.; Zhao, W.; Liu, Y.; Wang, S.; Yang, S. Spatial Consistency Assessments for Global Land-Cover Datasets: A Comparison among GLC2000, CCI LC, MCD12, GLOBCOVER and GLCNMO. Remote Sens. 2018, 10, 1486. [Google Scholar] [CrossRef]
  30. Pérez-Hoyos, A.; Rembold, F.; Kerdiles, H.; Gallego, J. Comparison of Global Land Cover Datasets for Cropland Monitoring. Remote Sens. 2017, 9, 1118. [Google Scholar] [CrossRef]
  31. Gao, Y.; Guo, Y.; Wang, W.; Li, F.; Huang, P. Accuracy evaluation of different land use or land cover data in grassland of northern China. Chin. J. Ecol. 2019, 38, 283–293. [Google Scholar] [CrossRef]
  32. Niu, G.Z.; Shan, Y.; Zhang, H. Accuracy Assessment of Wetland Categories from the GlobCover2009 Data over China. Wetl. Sci. 2012, 10, 389–395. [Google Scholar] [CrossRef]
  33. Meng, W. Accuracy Assessment for Regional Land Cover Remote Sensing Mapping Product Based on Spatial Sampling: A Case Study of Shaanxi Province, China. J. Geo-Inf. Sci. 2015, 17, 742G749. [Google Scholar]
  34. Ma, J.; Qun, S.; Qiang, X.; Bowei, W. Accuracy Assessment and Comparative Analysis of GlobeLand30 Dataset in Henan Province. J. Geogr.-Inf. Sci. 2016, 18, 1563–1572. [Google Scholar]
  35. Wang, Y.; Zhang, J.; Liu, D.; Yang, W.; Zhang, W. Accuracy Assessment of GlobeLand30 2010 Land Cover over China Based on Geographically and Categorically Stratified Validation Sample Data. Remote Sens. 2018, 10, 1213. [Google Scholar] [CrossRef]
  36. Kussul, N.; Shelestov, A.; Basarab, R.; Skakun, S.; Kussul, O.; Lavreniuk, M. Geospatial intelligence and data fusion techniques for sustainable development problems. ICTERI 2015, 1356, 196–203. [Google Scholar]
  37. Jokar Arsanjani, J.; Tayyebi, A.; Vaz, E. GlobeLand30 as an alternative fine-scale global land cover map: Challenges, possibilities, and implications for developing countries. Habitat Int. 2016, 55, 25–31. [Google Scholar] [CrossRef]
  38. Manakos, I.; Karakizi, C.; Gkinis, I.; Karantzalos, K. Validation and Inter-Comparison of Spaceborne Derived Global and Continental Land Cover Products for the Mediterranean Region: The Case of Thessaly. Land 2017, 6, 34. [Google Scholar] [CrossRef]
  39. Mayaux, P.; Eva, H.; Gallego, J.; Strahler, A.H.; Herold, M.; Agrawal, S.; Naumov, S.; De Miranda, E.E.; Di Bella, C.M.; Ordoyne, C.; et al. Validation of the global land cover 2000 map. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1728–1739. [Google Scholar] [CrossRef]
  40. John, L.; Renato, C.; Ilaria, R.; Mario, B. Global Land Cover-Share of Year 2014-Beta-Release 1.0 FAO Global Land Cover Network (GLCN). Available online: https://www.fao.org/uploads/media/glc-share-doc.pdf (accessed on 1 January 2021).
  41. Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.; Mathis, M.; Brumby, S. Global Land Use/Land Cover with Sentinel 2 and Deep Learning; IEEE: Piscataway, NJ, USA, 2021; pp. 4704–4707. [Google Scholar]
  42. Brown, C.F.; Brumby, S.P.; Guzder-Williams, B.; Birch, T.; Hyde, S.B.; Mazzariello, J.; Czerwinski, W.; Pasquarella, V.J.; Haertel, R.; Ilyushchenko, S.; et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci. Data 2022, 9, 251. [Google Scholar] [CrossRef]
  43. Liu, H.; Gong, P.; Wang, J.; Clinton, N.; Bai, Y.; Liang, S. Annual dynamics of global land cover and its long-term changes from 1982 to 2015. Earth Syst. Sci. Data 2020, 12, 1217–1243. [Google Scholar] [CrossRef]
  44. Tateishi, R.; Hoan, N.T.; Kobayashi, T.; Alsaaideh, B.; Tana, G.; Phong, D.X. Production of Global Land Cover Data—GLCNMO2008. J. Geogr. Geol. 2014, 6, 1–15. [Google Scholar] [CrossRef]
  45. Kobayashi, T.; Tateishi, R.; Alsaaideh, B.; Sharma, R.C.; Wakaizumi, T.; Miyamoto, D.; Bai, X.; Long, B.D.; Gegentana, G.; Maitiniyazi, A.; et al. Production of Global Land Cover Data—GLCNMO2013. J. Geogr. Geol. 2017, 9, 1–15. [Google Scholar] [CrossRef]
  46. Tateishi, R.; Uriyangqai, B.; Al-Bilbisi, H.; Ghar, M.A.; Tsend-Ayush, J.; Kobayashi, T.; Kasimu, A.; Hoan, N.T.; Shalaby, A.; Alsaaideh, B.; et al. Production of global land cover data—GLCNMO. Int. J. Digit. Earth 2011, 4, 22–49. [Google Scholar] [CrossRef]
  47. Defourny, P.; Kirches, G.; Brockmann, C.; Boettcher, M.; Peters, M.; Bontemps, S.; Lamarche, C.; Schlerf, M.; Santoro, M. Land Cover CCI: Product User Guide Version 2. Available online: http://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-PUG-v2.5.pdf (accessed on 1 January 2021).
  48. Bicheron, P.; Defourny, P.; Brockmann, C.; Schouten, L.; Vancutsem, C.; Huc, M.; Bontemps, S.; Leroy, M.; Frédéric, A.; Herold, M.; et al. GLOBCOVER: Products Description and Validation Report; ResearchGate: Berlin, Germany, 2008. [Google Scholar]
  49. Defourny, P.; Bontemps, S.; Bogaert, E. GLOBCORINE 2009. In Product Description Manual; ResearchGate: Berlin, Germany, 2010. [Google Scholar]
  50. Buchhorn, M.; Smets, B.; Bertels, L.; Roo, B.D.; Lesiv, M.; Tsendbazar, N.-E.; Li, L.; Tarko, A.J. Copernicus Global Land Service: Land Cover 100 m: Version 3 Globe 2015–2019: Product User Manual; Zenodo: Geneve, Switzerland, 2020. [Google Scholar] [CrossRef]
  51. Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; Peng, S.; Han, G.; Zhang, H.; He, C.; et al. Concepts and Key Techniques for 30 m Global Land Cover Mapping. Acta Geod. Et Cartogr. Sin. 2014, 43, 551–557. [Google Scholar] [CrossRef]
  52. Chen, J.; Chen, L.; Chen, F.; Ban, Y.; Li, S.; Han, G.; Tong, X.; Liu, C.; Stamenova, V.; Stamenov, S. Collaborative validation of GlobeLand30: Methodology and practices. Geo-Spat. Inf. Sci. 2021, 24, 134–144. [Google Scholar] [CrossRef]
  53. Liu, J.; Liu, M.; Zhuang, D.; Zhang, Z.; Deng, X. Study on Spatial Pattern of Land-use Change in China During 1995–2000. Sci. China Ser. D Earth Sci. 2003, 46, 373–384. [Google Scholar] [CrossRef]
  54. Liu, J.; Kuang, W.; Zhang, Z.; Xu, X.; Qin, Y.; Ning, J.; Zhou, W. Spatiotemporal characteristics, patterns and causes of land use changes in China since the late 1980s. J. Geogr. Sci. 2014, 69, 3–14. [Google Scholar] [CrossRef]
  55. Yang, J.; Huang, X. The 30  m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
  56. Yu, Z.; Jin, X.; Miao, L.; Yang, X. A historical reconstruction of cropland in China from 1900 to 2016. Earth Syst. Sci. Data 2021, 13, 3203–3218. [Google Scholar] [CrossRef]
  57. Olofsson, P.; Stehman, S.; Woodcock, C.; Sulla-Menashe, D.; Sibley, A.; Newell, J.; Friedl, M.; Herold, M. A global land-cover validation data set, part I: Fundamental design principles. Int. J. Remote Sens. 2012, 33, 5768–5788. [Google Scholar] [CrossRef]
  58. Fung, T.; LeDrew, E. The Determination of Optimal Threshold Levels for Change Detection Using Various Accuracy Indices. Photogramm. Eng. Remote Sens. 1988, 54, 1449–1454. [Google Scholar]
  59. Janssen, L.L.F.; Wel, F.V.D. Accuracy assessment of satellite derived land—Cover data: A review. Photogramm. Eng. Remote Sens. 1994, 60, 419–426. [Google Scholar]
  60. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
  61. Clark, M.L.; Aide, T.M.; Grau, H.R.; Riner, G. A scalable approach to mapping annual land cover at 250 m using MODIS time series data: A case study in the Dry Chaco ecoregion of South America. Remote Sens. Environ. 2010, 114, 2816–2832. [Google Scholar] [CrossRef]
  62. Ran, Y.; Li, X.; Lu, L. Accuracy Evaluation of the Four Remote Sensing Based Land Cover Products over China. J. Glaciol. Geocryol. 2009, 31, 490–500. [Google Scholar]
  63. Foody, G. Status of Land Cover Classification Accuracy Assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
  64. Lambin, E.; Geist, H. Land-Use and Land-Cover Change: Local Processes and Global Impacts; Science & Business Media: Berlin, Germany, 2006; Volume 18. [Google Scholar]
  65. Zhang, C.; Dong, J.; Ge, Q. Quantifying the accuracies of six 30-m cropland datasets over China: A comparison and evaluation analysis. Comput. Electron. Agric. 2022, 197, 106946. [Google Scholar] [CrossRef]
Figure 1. Geographical location and topography map of Northeast China.
Figure 1. Geographical location and topography map of Northeast China.
Remotesensing 15 05134 g001
Figure 2. The workflow of this research. OA = overall accuracy; PA = producer’s accuracy; UA = use’s accuracy; MCC = the Matthews correlation coefficient. The pink font shows the national-scale land cover datasets, and the others are all at global scale.
Figure 2. The workflow of this research. OA = overall accuracy; PA = producer’s accuracy; UA = use’s accuracy; MCC = the Matthews correlation coefficient. The pink font shows the national-scale land cover datasets, and the others are all at global scale.
Remotesensing 15 05134 g002
Figure 3. Distribution of validation samples across the Northeast in Phases 2000, 2010, 2015, and 2020. The graduated red points represent true non-cropland, and the graduated green points indicate real cropland.
Figure 3. Distribution of validation samples across the Northeast in Phases 2000, 2010, 2015, and 2020. The graduated red points represent true non-cropland, and the graduated green points indicate real cropland.
Remotesensing 15 05134 g003
Figure 4. Comparisons of the spatial accuracy indexes of the datasets in 2000, 2010, 2015, and 2020.
Figure 4. Comparisons of the spatial accuracy indexes of the datasets in 2000, 2010, 2015, and 2020.
Remotesensing 15 05134 g004
Figure 5. The commission and omission error distribution of nine data points in 2015.
Figure 5. The commission and omission error distribution of nine data points in 2015.
Remotesensing 15 05134 g005aRemotesensing 15 05134 g005b
Figure 6. The commission and omission error distribution of six data points in 2020. The disagreements relate to the distribution of verified points and the corresponding position in each dataset. In Figure 5 and Figure 6, the ginger-pink dots represent that true pixels of non-cropland are classified as cropland (representing a commission error). The spruce-green dots indicate that true pixels of cropland are classified as non-cropland (representing an omission error).
Figure 6. The commission and omission error distribution of six data points in 2020. The disagreements relate to the distribution of verified points and the corresponding position in each dataset. In Figure 5 and Figure 6, the ginger-pink dots represent that true pixels of non-cropland are classified as cropland (representing a commission error). The spruce-green dots indicate that true pixels of cropland are classified as non-cropland (representing an omission error).
Remotesensing 15 05134 g006
Figure 7. Demonstration of the cropland magnified performances of four regions in different datasets. Dark-green indicates cropland and white indicates non-cropland. The positions of (AD) are shown in Figure 8b.
Figure 7. Demonstration of the cropland magnified performances of four regions in different datasets. Dark-green indicates cropland and white indicates non-cropland. The positions of (AD) are shown in Figure 8b.
Remotesensing 15 05134 g007
Figure 8. Illustration of the spatial agreement level and cropland area curves of meridional and zonal of nine datasets in 2015. Charts (a,c) are the area curves of the meridional and zonal datasets, respectively. The overlapping result of datasets is shown in subplot (b), representing the resampled evaluated data at a resolution of 30 m, and the digital numbers indicate different consistency levels. The stacked bar chart is the cropland area proportion (%) at the agreement levels.
Figure 8. Illustration of the spatial agreement level and cropland area curves of meridional and zonal of nine datasets in 2015. Charts (a,c) are the area curves of the meridional and zonal datasets, respectively. The overlapping result of datasets is shown in subplot (b), representing the resampled evaluated data at a resolution of 30 m, and the digital numbers indicate different consistency levels. The stacked bar chart is the cropland area proportion (%) at the agreement levels.
Remotesensing 15 05134 g008
Figure 9. Illustration of the spatial agreement level and cropland area curves of meridional and zonal of six datasets in 2020. Charts (a,c) are the area curves of the meridional and zonal datasets, respectively. The overlapping result of datasets is shown in subplot (b), representing the resampled evaluated data at a resolution of 30 m, and the digital numbers indicate different consistency levels. The stacked bar chart is the cropland area proportion (%) at the agreement levels.
Figure 9. Illustration of the spatial agreement level and cropland area curves of meridional and zonal of six datasets in 2020. Charts (a,c) are the area curves of the meridional and zonal datasets, respectively. The overlapping result of datasets is shown in subplot (b), representing the resampled evaluated data at a resolution of 30 m, and the digital numbers indicate different consistency levels. The stacked bar chart is the cropland area proportion (%) at the agreement levels.
Remotesensing 15 05134 g009
Figure 10. Scatterplots between the CLCD and CGLS-LC100, CLUDs, Esri, GLC_FCS30, and GlobeLand30. The axes represent the cropland area aggregation within the grid cell of 8.438 km × 9.537 km across Northeast China of six datasets, which only includes the comparison of cropland area. The blue dots represent the cropland area value aggregation within a grid cell. The black dotted line represents the 1:1 auxiliary line, while the red solid line depicts the data fitting curve. The unit of RMSE is km2.
Figure 10. Scatterplots between the CLCD and CGLS-LC100, CLUDs, Esri, GLC_FCS30, and GlobeLand30. The axes represent the cropland area aggregation within the grid cell of 8.438 km × 9.537 km across Northeast China of six datasets, which only includes the comparison of cropland area. The blue dots represent the cropland area value aggregation within a grid cell. The black dotted line represents the 1:1 auxiliary line, while the red solid line depicts the data fitting curve. The unit of RMSE is km2.
Remotesensing 15 05134 g010aRemotesensing 15 05134 g010b
Figure 11. Comparison of the cropland area of all examined datasets with the statistical results by Yu et al. [56].
Figure 11. Comparison of the cropland area of all examined datasets with the statistical results by Yu et al. [56].
Remotesensing 15 05134 g011
Figure 12. Scatterplots of the prefecture-level city reconstructed cropland area vs. the aggregated area of cropland from each dataset. The blue + symbol represent the cropland area value aggregation within a prefecture-level city. The black dotted line represents the 1:1 auxiliary line, while the red dashed line depicts the data fitting curve. The unit of RMSE is km2.
Figure 12. Scatterplots of the prefecture-level city reconstructed cropland area vs. the aggregated area of cropland from each dataset. The blue + symbol represent the cropland area value aggregation within a prefecture-level city. The black dotted line represents the 1:1 auxiliary line, while the red dashed line depicts the data fitting curve. The unit of RMSE is km2.
Remotesensing 15 05134 g012
Figure 13. Comparison of the overall accuracy of different datasets grouping by data resolution, producing time, sensor, and classification algorithm, with different individual color markers for different land cover datasets.
Figure 13. Comparison of the overall accuracy of different datasets grouping by data resolution, producing time, sensor, and classification algorithm, with different individual color markers for different land cover datasets.
Remotesensing 15 05134 g013
Figure 14. Study case showing the comparison of cropland identification in the southeast of Chagan Lake with five datasets in Phase-2020. Dark green indicates cropland, and white indicates non-cropland.
Figure 14. Study case showing the comparison of cropland identification in the southeast of Chagan Lake with five datasets in Phase-2020. Dark green indicates cropland, and white indicates non-cropland.
Remotesensing 15 05134 g014
Figure 15. The proportion of cropland and non-cropland in mosaic cropland in GlobCover-2005, GlobCover-2009, CCI-LC-2000, and CCI-LC-2010.
Figure 15. The proportion of cropland and non-cropland in mosaic cropland in GlobCover-2005, GlobCover-2009, CCI-LC-2000, and CCI-LC-2010.
Remotesensing 15 05134 g015
Table 1. The summary of the main information of all collected datasets in this study. They are sorted in order of spatial resolution, from lowest to highest.
Table 1. The summary of the main information of all collected datasets in this study. They are sorted in order of spatial resolution, from lowest to highest.
DatasetsSatellites or SensorTimeSpatial ResolutionClassification TechniqueClassification Scheme
GLASS-GLCAVHRR GLASS CDR1982–20155 kmRandom forest and LandTrendr7 classes
GLC2000SPOT VGT20001 kmGenerally unsupervised
classification
22 classes
FAO-GLCshare----20141 kmData fusion11 classes
CLUDsLandsat1980 1990 1995
2000 2005 2010
2015 2020
1 kmExtraction of remote
sensing information
6 classes *
GLCNMOTerra MODIS20031 kmSupervised classification20 classes
Terra and Aqua MODIS2008 2013500 mSupervised classification20 classes
CCI-LCENVISAT MERIS
SPOT VGT
1992–2015300 mUnsupervised spatio-temporal
clustering and Machine learning classification
22 classes
GlobCoverMERIS2005 2009300 mGenerally unsupervised
classification
22 classes
CGLS-LC100PROBA-V2015–2019100 mSupervised classification
and Random forest
23 classes
GlobeLand30Landsat TM/ETM+ HJ-12000 2010 202030 mPixel-Object-Knowledge
classification approach
10 classes
CLCDLandsat1990–201930 mRandom forest9 classes
GLC_FCS30Landsat TM/ETM+/OLI2015 202030 mOperational SPECLib-based
approach and Random forest
30 classes
FROM-GLCLandsat TM/ETM+/OLI
Sentinel-2
201730 mRandom forest10 classes
EsriSentinel-2202010 mDeep learning model10 classes
Note: 6 classes * means 6 classes in the first level and 25 classes in the second level.
Table 2. The datasets included in four phases.
Table 2. The datasets included in four phases.
Phase-2000Phase-2010Phase-2015Phase-2020
GLASS-GLCGLASS-GLCGLASS-GLC
GLC2000 FAO-GLCshare-2014
CLUDsCLUDsCLUDsCLUDs
GLCNMO-2003GLCNMO-2008GLCNMO-2013
CCI-LCCCI-LCCCI-LC
CGLS-LC100CGLS-LC100-2019
GlobeLand30GlobeLand30 GlobeLand30
CLCDCLCDCLCDCLCD-2019
GlobCover-2009GLC_FCS30GLC_FCS30
GlobCover-2005GlobCover-2005FROM-GLC-2017Esri
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cui, P.; Chen, T.; Li, Y.; Liu, K.; Zhang, D.; Song, C. Comparison and Assessment of Different Land Cover Datasets on the Cropland in Northeast China. Remote Sens. 2023, 15, 5134. https://doi.org/10.3390/rs15215134

AMA Style

Cui P, Chen T, Li Y, Liu K, Zhang D, Song C. Comparison and Assessment of Different Land Cover Datasets on the Cropland in Northeast China. Remote Sensing. 2023; 15(21):5134. https://doi.org/10.3390/rs15215134

Chicago/Turabian Style

Cui, Peipei, Tan Chen, Yingjie Li, Kai Liu, Dapeng Zhang, and Chunqiao Song. 2023. "Comparison and Assessment of Different Land Cover Datasets on the Cropland in Northeast China" Remote Sensing 15, no. 21: 5134. https://doi.org/10.3390/rs15215134

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop