1. Introduction
Human activity increasingly impacts the Earth’s environment and ecosystems [
1,
2]. To characterize the Earth’s fundamental characteristics and environmental processes, maps have been created for providing information about the distribution and dynamic changes of the land cover. This map information has served as input for a great variety of models to better understand the earth including climate change [
3], biogeochemical cycling [
4], and carbon cycling [
5], to name but a few. Therefore, characterizing the distribution of the land cover is crucial for Earth science and decision making.
To efficiently and effectively characterize the spatial distribution of land cover, remotely sensed imagery and associated technologies are employed for producing land cover maps at different spatial resolutions (i.e., raster/pixel sizes). As Grekousis et al. [
6] reported, 23 global and 41 regional land cover products based on remote sensing technologies have been produced and released for use by the Earth science community. For example, the United States Department of Agriculture (USDA), National Agriculture Statistic Service (NASS) annually produces the Cropland Data Layer (CDL) at 30 m resolution for U.S.A. [
7]. Other existing maps provide essential information at different spatial resolutions including 1 km, 500 m, 300 m, 250 m, and 30 m [
8], and have continued to improve in detail (spatial resolution) over time.
However, in addition to these existing maps at specific spatial resolutions, other spatial resolutions may also be required to obtain optimal results for various Earth science models. Previous research has revealed that model outputs have varied considerably when using land cover maps of different spatial resolutions. For example, Buyantuyev and Wu [
9] demonstrated that using land cover maps of different spatial resolutions resulted in variation of estimated land surface temperature. Grafius et al. [
10] stated that the outputs of their ecosystem service models varied due to the use of land cover maps with differing spatial resolutions. Many researchers have emphasized that often the existing spatial resolutions cannot meet the requirements needed to obtain the optimal results for various Earth science models. In other words, the modeler would rather rescale a finer resolution map to the desired spatial resolution, if possible, then to use an existing coarse scale map.
Recognizing the need for various spatial resolutions by the Earth science community, rescaling techniques have been developed to fill the resolution gaps for existing land cover maps [
11]. These rescaling techniques consist of two possible outcomes: upscaling and downscaling. The upscaling method decreases the spatial resolution while the downscaling method increases the spatial resolution [
12]. These two methods have been documented as vitally necessary and efficient ways to obtain maps at different spatial resolutions to support a variety of Earth science research [
11,
13,
14].
Upscaling produces coarser resolution maps from finer resolution maps using two different approaches: numerical aggregation and categorical aggregation. Numerical aggregation first computes the pixel value (e.g., digital number, DN) for the coarser resolution imagery based on some function between the finer resolution pixels and the coarser resolution pixels [
15]. Then, the coarser resolution imagery is classified to produce the upscaled map. To conduct numerical aggregation, mean aggregation and central pixel resampling (CPR) are commonly employed. The mean aggregation algorithm calculates the average of the pixel values from the finer resolution imagery as the pixel value for the coarser resolution map [
15]. The CPR algorithm selects the finer resolution image’s central pixel value corresponding to the coarser resolution pixel as the output pixel value [
16].
The other common way to obtain upscaled maps, categorical aggregation, assigns a class label to an upscaled map based on class labels in the associated finer resolution map by using different aggregation logic. The choice of aggregation logic includes: Majority Rule Based (MRB) (e.g., [
17]), Random Rule Based (RRB) (e.g., [
18]) and Point-centered Distance-Weighted moving window (PDW) [
14]. MRB determines the class label for an upscaled map by selecting the most frequently occurring class of the fine resolution map. RRB randomly selects a class from the specific pixels of the fine resolution map. PDW first employs a weighted sampling grid of variable resolution to sample the existing map. Then, the class label for the corresponding location in the upscaled map is randomly selected from the frequency of the class labels derived from the samples.
Downscaling generates finer resolution maps from a coarser resolution map using either a numerical or categorical approach. Similar to the upscaling methods, the numerical approach first predicts finer resolution images that are then classified to obtain the downscaled map. The categorical approach directly transforms the categorical variable, usually representing the land cover class, from the coarser resolution map to the corresponding finer resolution map [
13]. One common method of downscaling is area-to-point prediction (ATPP). The ATPP approach interpolates the coarser resolution imagery to generate the finer resolution imagery by integrating support information from multiple sources of data. A variety of complex algorithms have been developed to conduct the ATPP such as area-point-kriging (ATPK) [
19], area-to-point cokriging (ATPCK) [
12], etc. For example, ATPCK incorporates the support information from imagery of different resolutions into a random field model to predict the finer resolution. The correlation within and cross-correlation between the imagery of different resolutions are used to minimize the variance of the error [
12]. More details about ATPP can be found in Tang et al. [
12] and Atkinson [
13].
The categorical method for downscaling is frequently conducted by employing super-resolution mapping (SRM) algorithms [
13]. The SRM divides a coarser resolution pixel into several finer resolution pixels and determines the class label for the finer resolution pixels by translating the coarse resolution fractions into a finer resolution land cover map [
20,
21]. Existing approaches produce a hard-classified land cover map at finer resolution typically choosing one class label based on the maximum a posteriori (MAP) principle. This principle is similar to that of many hard classification algorithms that assign the class label with the highest posteriori probability as the class type for the output classified map [
21]. For example, Ling et al. [
21] fused the class membership (CMP) of the finer resolution pixel derived by using the local smoothness priority (LSP) and the downscaled coarse fraction (DCF), respectively. The CMP is on the assumption that one pixel belongs to one of the predefined land cover classes, which represents the spatial distribution information of classes. The LSP represents the relationship between neighboring fine resolution pixels, while the DCF considers the fractional values between neighboring coarse resolution pixels. These SRM algorithms have been successfully conducted to generate finer resolution maps.
Since rescaling techniques have been well documented as necessary and effective approaches to produce land cover maps at a wide range of spatial resolutions (e.g., [
14,
22]), how to evaluate these upscaled or downscaled maps is becoming a critical issue. Although existing mapping projects typically have reference data that were used to assess the accuracy of that map, utilization of these reference data for validating any rescaled maps created by others is problematic. Often, the reference data are not readily available to the map users. In addition, the reference data do not match the spatial resolution of either an upscaled or downscaled map. Therefore, a simple and easy way to evaluate the accuracy of rescaled maps needs to be developed. Additionally, producing rescaled maps draws into question whether the rescaled maps are better than the existing land cover maps for use in Earth science modeling. If so, then methods for evaluating the rescaled maps will be beneficial for the community to make better decisions for selecting the appropriate datasets.
Therefore, the primary objective of this paper is to present a new method using a similarity matrix based on a contingency table to evaluate the accuracy of the rescaled maps. This method will also allow testing to see if existing, coarse resolution maps have a higher or lower accuracy than rescaled maps derived from existing fine resolution maps. Finally, testing if the extent (i.e., size) of the study sites has an influence on the accuracy of rescaled maps has not been quantitatively performed. Thus, the experiments done in this research vary the study area sizes over three different extent levels. All the analysis in this research was conducted using upscaled crop/non-crop maps derived by the MRB.
2. Materials and Methods
In this section, we describe the selection of data and study areas, the principles behind and the calculation of the similarity matrix, the design of our experiments using study areas of three different spatial extents, and the techniques for producing the upscaled maps. The methods consist of (1) selection of two datasets: the base map and the existing coarser resolution map; (2) selection of the study areas with different landscape patterns and different extents (i.e., size of area); (3) the similarity matrix approach proposed to evaluate either the rescaled maps or the existing, coarser resolution maps compared to the finer resolution maps; and (4) the MRB aggregation method that was used to produce the upscaled maps to test the performance of similarity matrix and explore if the upscaled maps are better than the existing coarser resolution maps.
2.1. Data Descriptions
Based on the objectives of this paper, two datasets were required. The first dataset, NASS’s CDL data were utilized to generate the crop/non-crop base maps that were used to produce the upscaled maps. The crop/non-crop base maps were also used as reference maps for assessing the existing coarser resolution maps (i.e., Global Food Security-Support Analysis Data, GFSAD). The CDL data can be obtained from the NASS’s online geospatial application-CropScape (
https://nassgeodata.gmu.edu/CropScape/). These data are annually updated, raster-formatted, and geo-referenced land cover maps [
23], whose accuracies for major crops range from 85% to 95% [
24]. Due to its high accuracy for predicting the information of crops and identifying the field crops, the CDL data are extensively applied in Earth science research [
7,
25]. Therefore, this paper utilized the CDL data at 30 m spatial resolution for the year 2010 to generate the crop/non-crop maps for each of the different study sites. The crop/non-crop maps were extracted by simplifying the crop type of CDL data based on the metadata provided by NASS (available online:
https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/meta.php).
The other dataset needed in this study was the existing crop/non-crop maps at specific spatial resolutions that could then be compared with the upscaled maps. The GFSAD Crop Dominance Global 1 km and 250 m for nominal year 2010 were obtained from the Land Processes Distributed Active Archive Center (LP DAAC) provided by the National Aeronautics and Space Administration (NASA) (available from:
https://lpdaac.usgs.gov/dataset_discovery/measures/measures_products_table/gfsad1kcd_v001). The GFSAD crop/non-crop (cropland extent) maps were produced as part of a NASA project evaluating global food security in the twenty-first century. More details regarding GFSAD can be found in Teluguntla et al. [
26] and Teluguntla et al. [
27]. Note that GFSAD data at 30 m resolution were not used to produce the base maps in this study because maps for the year 2010 are not available. To facilitate comparison of the upscaled maps created in this study and the existing GFSAD maps, the GFSAD maps at 1 km and 250 m were resampled to 960 m and 240 m, respectively. ArcGIS version 10.4 was used to resample the GFSAD maps using the majority resampling technique. This technique determines the corresponding 4 by 4 cells in the input space that are closest to the center of the output cell and uses the majority of the 4 by 4 neighbors (
http://pro.arcgis.com/en/pro-app/tool-reference/data-management/resample.htm).
2.2. Study Areas
To analyze the influence of the size of the area on the upscaling accuracy, this research employed study sites for three different extents: regional level maps (a portion of the USA), US state level maps, and Agricultural Statistical District (ASD) level maps. The boundaries of the regions and the ASDs are defined by the USDA’s NASS. The U.S.A. has been divided into twelve regions (
Figure 1). Each ASD is a group of counties based on the geography, climate, and cropping practice within each state [
24], and has been widely-used in agricultural research (e.g., [
11]).
To evaluate the landscape pattern of each of the study sites, the Patch-Per-Unit area (PPU) (Equation (1)), a measure of the heterogeneity [
28], was employed to compute the heterogeneity for each study site. As the landscape becomes heterogeneous, PPU increases [
28]. The crop/non-crop base maps derived from NASS’s CDL were utilized to compute the PPU for each state study site. The highest PPU for all states was 134.97, while the lowest was 4.72. The PPU (heterogeneity of each state) was then divided into three levels: lower level (0 < PPU ≤ 40), medium level (40 < PPU ≤ 100), and higher level (PPU >100). Three states were then selected for analysis from each of the three heterogeneity levels.
where m is the total number of patches, n is the total number of pixels, λ is a scaling constant equal to the area of a pixel.
The PPU of each of the ASDs in the nine selected states were then calculated and divided into three different levels: lower level (0 < PPU ≤ 50), medium level (50 < PPU ≤ 120), and higher level (PPU >120). For each level of heterogeneity, three ASDs were randomly selected for analysis in this study. Therefore, we selected a total of twelve regions, nine states, and nine ASDs to be analyzed in this study. The landscape characteristics of these regions, states, and ASDs are shown in
Table 1.
2.3. Description of the Similarity Matrix
The similarity matrix proposed here to evaluate the rescaling is based on a contingency table approach as shown in
Table 2. Each row represents the area of each class in the upscaled map, while each column represents the area of each class in the base map.
Aik denotes the area that is classified as class
i in the base map but as class
k in the upscaled map. Note that class
k can be the same or different with class
i. Entries on the diagonal represent the consistent area (i.e., similarity) for each class between the base map and the upscaled map. Thus,
Aii shows how many areas for class
i are correctly identified in the upscaled map.
A+k sums the area of the class
k in the upscaled map (the column total), and
A+i sums the area of the class
i in the base map (the row total). The omission error (OE) for class
i of the upscaled map can be evaluated by Equation (2), which represents the percentage of area for class
i that is omitted from the upscaled map. The OE of class
i also represents the underestimation of class
i in the upscaled map. The commission error (CE) for class
i can be evaluated by Equation (3), which represents the percentage of area for class
i that is committed from the correct base map. The CE of class
i represents the overestimation of the class
i in the upscaled map. The overall similarity (OS) can be evaluated by Equation (4), which represents the percentage of the area for all classes that are correctly represented in the upscaled maps.
2.4. Calculation of Similarity Matrix
To generate the similarity matrix, five steps should be taken:
Compute the total number of the square windows () to cover the base map. First, determine the total number of rows, M, and the total number of columns, N, for the upscaled map based on the desired spatial resolution (pixel size). Then, is computed by .
Place square windows (corresponding to the upscaled pixel size) over the base map. The square window, , corresponds to the pixel at row m and column n in the upscaled map.
Identify the class type of the upscaled pixel corresponding to , which is denoted as class j (the entire upscaled pixel is class j). Then, identify the class types of the base map pixels within , which is denoted as class i.
Calculate the area of class i of the base map pixels within . This area is denoted as . Note that class i can be either the same or different than class j from the upscaled map. The area for class i in the base map that is classified as class j in the upscaled map, , can be computed by , which is used to fill in the values in the contingency table to obtain the similarity matrix.
The OS, OE, and CE can be obtained to evaluate the accuracy of the upscaled map according to the principles in the
Section 2.3.
Figure 2 presents an example of generating the similarity matrix and will make this method much clearer. The base map covers an area of 36 m
2 and has a spatial resolution (pixel size) of 1 m (resulting in 36 pixels in this study area). There are two classes in this map; Class 1 (C1) is the gray color and Class 2 (C2) is the white color in both the base and upscaled maps. The desired spatial resolution of the upscaled map is 3 m.
Step 1: The number of square windows () in the upscaled map is 4 (the number of rows is M = 2 and the number of columns is N = 2; 2 × 2 = 4).
Step 2: The computed windows are placed over the base map. The size of each square window is same as the spatial resolution (pixel size = 3 m) of the upscaled map.
Step 3: The class type of
is C1 for the upscaled map as determined using the MRB upscaling method. The pixels within
in the base map are marked as P1, P2, P3, P4, P5, P6, P7, P8, and P9 in
Figure 2. The corresponding map classes are C1, C1, C2, C2, C1, C1, C1, C2, and C2, respectively.
Step 4: Compute for . Then, , , , and . Thus, .
Repeat step 3 and step 4 to obtain for , , . For , , , , . For , , , , . For , , , , . Then , , and .
Step 5:
Table 3 shows the results of similarity matrix for this example. Based on the similarity matrix, OE, CE and OS can be computed. The OS for this example is 75.0%. The CE and OE for this example can be found in
Table 3.
2.5. Design of the Experiment
To investigate the performance of the similarity matrix on rescaling techniques, the upscaling method was selected to be used in this research. No analysis was performed for downscaling, but the approach should work the same. The crop/non-crop maps were upscaled based on a widely-used aggregation method (i.e., MRB). The details of the MRB are described in
Section 2.6. The similarity matrix was then obtained to illustrate the accuracy of the upscaled maps.
To explore if existing, coarser resolution land cover maps are better than newly created upscaled maps derived from accurate and finer resolution land cover maps, existing land cover maps at two different resolutions (240 m and 960 m) were chosen. Note that we are aware that producing land cover maps at coarse resolutions are vitally important for various research projects. Thus, our work here only aims to develop a tool (i.e., the similarity matrix) for evaluating the upscaled maps. It is not our intent to minimize the contributions of the existing, coarser resolution land cover maps for the Earth science community.
2.6. Upscaling Method
The MRB method for upscaling categorical maps has been recommended by [
15] for agricultural projects. Therefore, our work utilized this method as an appropriate way to obtain the upscaled maps. To implement the MRB, a predefined square window corresponding to the coarser resolution pixel is constructed. Then, the MRB assigns the class type for the coarser resolution pixel based on selection of the most frequently occurring class in the base map within the predefined square window [
17,
18]. If there is more than one major class in the square window, the class for the coarser resolution map will be randomly determined from these major classes [
15]. In this work, the base map was upscaled to 240 m and 960 m, respectively.
4. Discussion
4.1. Significance of the Similarity Matrix
Rescaling techniques inevitably lead to errors in land cover area (e.g., [
11,
14]). These errors in area are not only derived from the omitted areas from the base map but also by the committed areas created in the upscaled map. If the OS is 100% (i.e., the base map and the upscaled map are exactly the same (
Figure 6a,d), it is not necessary to compute the OE and CE. However, an OS of 100% rarely occurs (e.g., [
11]). Even if the OS of 100% occurs at a particular spatial resolution, it is difficult to know if the OS will also be 100% at another spatial resolution.
Figure 6 shows an example where the base map (
Figure 6a) is upscaled by the MRB to two different spatial resolutions (
Figure 6d,e).
Figure 6b,c are predefined windows for producing
Figure 6d,e, respectively. The gray color represents class 1, while the white color represents class 2. In some situations, the area information of one class in the upscaled map will be same as its area information in the base map. For example, the base map pixels, P1, P2, P3, and P4, were aggregated to upscaled pixel Pc (
Figure 6d). Since class 2 occurred more frequently within the predefined window, W1, the class type of Pc is class 2 according to the principle of MRB. However, the OS cannot be always 100% due to the CE and OE. For example, In
Figure 6d, the OS is 100%, while in
Figure 6e the OS is 81.25%. Up to now, there has been no standard and simple way to evaluate the accuracy of the rescaled map and to report the uncertainty information of the rescaled map to the users of the land cover map. The similarity matrix proposed here provides such a way.
4.2. Issues for Generating Upscaled Maps
An important upscaling issue that should be noted here is the problem that can occur around the edges of the study area. If the dimensions of the base map and the rescaled map are not equally divisible, then some partial upscaled pixels will fall outside of the edge of the base map. As
Figure 7 demonstrates, two non-overlapping windows (W3 and W6) fall outside the edge of the base map. The upscaled pixels, P3 and P6, are determined based on only the pixels in the base map that fall within the W3 and W6. Thus, for W3 and W6, there are only eight pixels used to determine the class type in the upscaled map.
Interactive Data Language (IDL 8.3) was used to conduct MRB rescaling in this project. For this language, the implementation of the MRB rescaling algorithm begins from the upper-left pixel as the first pixel and ends with the lower-right pixel as the last pixel. Hence, the potential error of the determination of the boundary pixel in an upscaled map only occurs at the right edge and/or the bottom edge of the upscaled map. However, the similarity matrix is not impacted because only existing class types are included in the matrix calculation. In addition, the similarity matrix presents the uncertainty information by considering the error derived from the boundary pixels in the upscaled maps. These conditions further strengthen our confidence to recommend the similarity matrix to be extensively used to evaluate the rescaled maps.
4.3. Accuracy Analysis between Upscaled Maps and GFSAD Maps
Since one of our objectives was to compare existing coarser resolution maps with the upscaled maps, the GFSAD maps also were assessed using the similarity matrix. One issue that should be noted is that the crop/non-crop maps derived from the CDL data were assumed as the reference data for assessing the GFSAD maps [
29]. We are aware that the CDL data are not 100% accurate. However, this dataset is highly accurate and was readily available at finer resolution [
7] and therefore, was very appropriate for use in this project.
The comparisons of accuracy between the upscaled maps and GFSAD maps show that the upscaled maps can obtain higher accuracies (
Table 5,
Table 6,
Table 7 and
Table 8). These results demonstrate that employing upscaled maps will be a better way to further conduct the related earth modeling research compared to using the GFSAD maps. Note that we are not saying here that all the existing coarser resolution maps should be replaced by the upscaled maps, but rather that the similarity matrix should be used to test which is better.
The lower accuracy of the GFSAD map in the USA mainly results from fact that coarser spatial resolution imagery were used for producing the GFSAD maps, while the finer spatial resolution imagery were used for producing the CDL data. The finer resolution imagery used to generate the CDL data include Lansat-5 Thematic Mapper (TM), Landsat-7 Enhanced TM plus (ETM+), Landsat-8 Optical Land Imager (OLI), Resourcesat-1 Advanced Wide Field Sensor (AWiFS), Disaster Monitoring Constellation (DMC) DEIMOS-1, and UK2 sensors. This finer-resolution imagery was used to obtain base maps that can accurately represent the distribution of land cover. However, the GFSAD maps are mainly based on the MODerate resolution Imaging Spectroradiometer (MODIS) imagery at 250 m [
29]. As is known, greater amounts of mixed pixels exist in the coarser resolution imagery, which can introduce more errors into the classified maps (i.e., GFSAD map) [
30]. Therefore, compared to the upscaled maps derived from accurate, finer resolution CDL data, GFSAD likely will not obtain higher accuracy of the distribution of land cover.
To visually show the difference between the upscaled map and the GFSAD map, two sub-areas in the region of Upper-Midwest were selected (
Figure 8). The upscaled map either at 240 m or at 960 m accurately presents the crop distribution. The GFSAD map at 960 m of the region was mostly classified as crop, while the upscaled map present the original crop distribution as accurate as possible. These results demonstrate that the upscaled map can more accurately represent the distribution of land cover, which further strengthens our confidence in recommending the use of upscaled maps as inputs for earth science models. However, assessing and comparing the performance of the upscaled maps and existing coarser resolution maps are required before making these decisions.
4.4. Differences in Heterogeneity between the Upscaled Maps and the GFSAD Maps
A vitally important issue underlying different land cover maps is the difference in representations of landscape pattern. As acknowledged, several studies have reported that the representation of landscape pattern is biased in the upscaled map (e.g., [
11,
15]). However, comparisons between the representation of landscape pattern based on the upscaled map and the existing coarser resolution map has not been explored. Therefore, the heterogeneity, evaluated by PPU, was selected to be analyzed in this research. Note that the study sites with higher PPU are more heterogeneous than study sites with lower PPU.
The comparisons of heterogeneity represented by the PPU values (
Figure 4) and by visual observation (
Figure 8) between the upscaled maps and the GFSAD maps demonstrated that upscaled maps can obtain a better representation of the landscape patterns when compared to the GFSAD maps. This is because the upscaled maps can more accurately represent the distribution of land cover compared to the GFSAD maps. Furthermore, for users, a good quality land cover map should not only have a better map accuracy, but also a better representation of landscape pattern. Therefore, considering the comparisons of the PPU between two datasets and the users’ requirements, the upscaled map is more suitable and should be used in Earth science models.
4.5. Considerations with Respect to Heterogeneity
The heterogeneity of the study site has been emphasized as a major factor impacting the accuracy of the upscaled maps (e.g., [
11,
31]) and the accuracy of classification maps (e.g., [
32]), in general. Thus, the influence of heterogeneity on upscaled maps and GFSAD maps should be explored. As expected, the accuracies of both the upscaled maps and the GFSAD maps were considerably impacted by the heterogeneity of the study sites. Lower accuracies for the upscaled and the GFSAD maps, at each resolution, were obtained in the more heterogeneous areas compared to the homogeneous areas (
Figure 5,
Table 5,
Table 6,
Table 7 and
Table 8). This is because the spectral heterogeneity and spectral complexity of the imagery are higher in heterogeneous areas, which results in lower classification map accuracy (e.g., CDL data, GFSAD map) [
32]. If these heterogeneous maps (e.g., CDL data) were used as base maps for upscaling, mapping error would increase in the upscaled map and the accuracy of the upscaled map would be reduced [
11]. Therefore, given the lower upscaling accuracy in the heterogeneous areas, users should be cautious when utilizing upscaled maps in heterogeneous landscapes.
Recognizing the negative effect of heterogeneity on the upscaled maps requires not only that caution be taken when using the upscaling techniques in the heterogenous areas, but also demonstrates the need for developing improved upscaling techniques. The improved upscaling techniques should consider the heterogeneity as a factor to improve the upscaling accuracy. Additionally, the analyst that is producing the land cover map should be aware of lower accuracy occurring in the heterogeneous areas. More accurate and improved classification techniques are also recommended to be developed and employed for improving the accuracy of the land cover products, especially in the heterogeneous areas. These accurate land cover products can be further used for producing upscaled maps at different resolution meeting the requirements for various Earth observation models or their related research (e.g., [
33]).
4.6. Considerations with Respect to Extent of Maps
Landscape pattern has strong influence on the accuracy of the upscaled maps, as demonstrated by previous research (e.g., [
31,
34]). However, previous research rarely explored if the extent (i.e., size of the area) of the study sites impacted the upscaled maps. Therefore, as part of our research we employed different size study sites from large regions of the US to individual states, to smaller ASDs within a state. Although we did not test different upscaling methods to see if extent has an influence on the upscaling accuracy, it is worthy of discussion based on the results derived from using just the MRB approach.
The results of this study demonstrated that there was no difference between upscaled maps at the same spatial resolution for different extent levels (i.e., region level, state level, and ASD level). As long as the heterogeneity is similar, the upscaling accuracy did not show much difference between study area sizes. Instead, it is evident that heterogeneity and not extent is the main factor impacting the upscaling accuracy.
4.7. Implications, Limitations, and Future Work
The results of this work have important implications for decision making. The similarity matrix has been demonstrated as an efficient and effective method to assess the rescaled map. This method can provide detailed information on the accuracy of the rescaled map, which is important for users who wish to use these maps for further applications. In addition, comparisons between the upscaled map and any existing coarser resolution map draw demonstrate that it may be better to use the upscaled map than an existing coarser resolution map. The higher performance of the upscaled maps, in terms of accuracy and landscape pattern, confirms that they can be reliably used for Earth science modelling. Note that, here, we did do not emphasize the upscaled maps can always be better than the existing maps. The results illustrate that the upscaled maps can be one option for map users. Therefore, we strongly recommend that a comparison using the similarity matrix should be conducted to assess any existing coarse resolution maps with potential upscaled maps for use in any modelling.
Further, the role of heterogeneity in producing upscaled maps confirms that landscape pattern is an important consideration for those researchers developing upscaling techniques to obtain more accurate upscaled maps. Moreover, this study provides evidence that it is not necessary to consider the extent when developing a new upscaling technique. Overall, the achievements of this paper are essential for guiding land cover map users to select suitable datasets for their models and encourage them to consider heterogeneity when developing any new upscaling techniques.
Finally, we are aware that our research may have three limitations. The first major limitation is that there was no GFSAD map at 30 m resolution for 2010. Therefore, we used the CDL data at 30 m to generate the crop/non-crop map as the reference data to compare the results between GFSAD maps and the upscaled maps. These comparisons were based on the assumption that the CDL data are 100% correct, which we acknowledge is not the case. This limitation resulted in errors in the final results, which would be extremely difficult to quantitatively assess without an extremely accurate reference data. In order to do this assessment, reference data covering the entire USA would be needed. The cost for producing such a dataset in prohibitive and justifies our use of a very accurate (not 100%) data set (e.g., CDL).
The second limitation is that the results of this paper are based on the simple case of a crop/non-crop map and not a more complex land cover map with many land cover classes. Thus, our analysis should be repeated in future work using upscaled maps and the existing land cover maps that are more complex. Additionally, only the upscaling of maps has been explored in this paper. While these techniques are directly applicable to downscaled maps, the paper does not demonstrate this use and should be the subject of future work.
The third limitation is that only a single upscaling technique (i.e., MRB) was used here. As acknowledged, different upscaling techniques perform differently (e.g., [
15]). The goal of this paper was to demonstrate the power of the similarity matrix and not to evaluate upscaling algorithms. Therefore, it would be useful to use the similarity matrix approach to compare different upscaling techniques in some future work.
5. Conclusions
This paper first proposed a new method, the similarity matrix, for assessing the accuracy of rescaled maps using three measurements: overall similarity (OS), omission error (OE), and commission error (CE). A conventional upscaling technique, Majority Rule Based aggregation (MRB), was used to produce the upscaled maps to demonstrate the performance of the similarity matrix. Additionally, this paper explored if these upscaled maps perform better than existing coarse resolution crop/non-crop maps. To conduct these experiments, two datasets: the Cropland Data Layer (CDL) at 30 m for 2010 and the GFSAD nominal data for 2010 at 250 m and 1 km, were utilized. The CDL data were first used to extract the crop/non-crop maps for (1) providing the base maps for upscaling, and (2) providing reference data to assess the accuracy of the upscaled maps and the GFSAD maps. Further, to determine if the extent of the study site has influence on the performance of the upscaling, three different extent levels were employed. There were twelve regions covering the continental USA, nine states, and nine Agricultural Statistic Districts (ASDs) that were selected to perform the experiments. Several conclusions resulted from this analysis: (1) the similarity matrix can successfully assess the accuracy of the rescaled map and report the uncertainty information for the rescaled map. This information is beneficial for land cover map users for further assessing the uncertainty or accuracy of their models. Also, the similarity matrix can provide information for making decisions about selecting the datasets for further analysis such as use in Earth science modeling; (2) the upscaled maps outperformed the existing coarser resolution GFSAD maps in both accuracy and representation of the landscape pattern. This result demonstrates that evaluating the performance of the upscaled map and the existing coarser resolution map is required before making a decision for selecting the datasets as the input to the users’ model; (3) the influence of heterogeneity on the upscaling accuracy does not only reveal that lower accuracy was produced in the heterogeneous area, but also recommends that a new upscaling technique should consider heterogeneity as an important factor to improve the accuracy of the upscaling technique, and (4) the results between different study sites with different extents quantitatively confirms that the performance of upscaling has not been impacted by the extent of study site. In conclusion, extending our understanding on how to assess the rescaled map, determination for selecting the upscaled map or the existing land cover map, and suggestions for developing new upscaling techniques, is vitally helpful to the Earth science community as users of land cover maps.
Future work should focus on four aspects: (1) different downscaling/upscaling techniques should be employed to investigate if the downscaled/upscaled maps can really perform better than the existing land cover maps; (2) comparisons between upscaled maps and downscaled maps should be demonstrated; (3) landscape pattern should be considered for developing new downscaling/upscaling techniques, and (4) the land cover map with more map classes should be employed to explore if the upscaled map and downscaled map still can be a prospective data source for Earth science modelling research.