Sparse-Based Feature Selection for Discriminating Between Crops and Weeds Using Field Images

Daniel Guillermo García-Murillo¹¹,
Andrés M. Álvarez¹¹,
David Cárdenas-Peña¹²,
William Hincapie-Restrepo¹³ &
…
German Castellanos-Dominguez¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11896))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1813 Accesses
1 Citations

Abstract

Control of weed growing in yields is a critical task for reducing crop losses. Recently, image-based systems attempt to discriminate between crops and weeds from a set of features. Although some features have a physiological meaning, most of them are redundant or noisy. Therefore, selecting relevant features must result in interpretable and accurate results while reducing the computational complexity of the system. In this work, we introduce a sparse-based feature selection approach using the Lasso operator that eliminates noisy features aiming to improve the classification of crops. We evaluate our proposal on the Crop/Weed Field Image Dataset, for which we tune the parameters by maximizing the accuracy and minimizing feature dimension. Achieved performance results evidence that our proposed approach improves discrimination in comparison with other feature selection approaches, with the benefit of providing interpretability in weed/crop discrimination tasks.

You have full access to this open access chapter, Download conference paper PDF

Sparsity-regularized feature selection for multi-class remote sensing image classification

Article 25 January 2019

Classification Modeling of Multi-Featured Remote Sensing Images Based on Sparse Representation

Weed Recognition in Wheat Field Based on Sparse Representation Classification

Keywords

1 Introduction

The global population grows a rate of $1.09\%$ per year, increasing the demands for food and requiring higher agricultural yields [13]. However, one of the multiple challenges that has a marked correlation with crop yield loss is the management of weeds [15], i.e., the unwanted plants grow randomly in all the field, competing with crops for resources (like water, nutrients, and sunlight) [5].

To deal with the weed spreading, several control strategies have been proposed for several crops and scale fields [19]. In small scale fields, manual weed control is still being used even that it is a tedious, inefficient, and high labor-cost task. In turn, smart agriculture, including remote sensing and mechanical weed management methods for the tillage of the soil, arises as a better control approach as they are more productive and labor-saving [8]. Nevertheless, mechanical management will cause crop damage if the discrimination of intra-row weeds is not accurate enough [6].

Generally, traditional approaches devoted to weed discrimination using remote sensing include the following stages [3]: (i) feature estimation, (ii) feature extraction and selection, and (iii) classification. For the first stage, the Histogram of Oriented Gradients (HOG) is one of the principal scale-invariant feature descriptors used that allocate shape information of regions. However, the selection of relevant features is determinant to obtain interpretable and accurate results reducing computational complexity by removing redundant information concerning the task studied [12]. Among the apporaches to feature selection are attribute measures using decision trees [7], iterative backward elimination by a feature ranking obtained from the Random Forest algorithm [16], decision rule using the chi-square statistic metrics [11], measures based on Random Forest [2], measures of highest normalized information gain [18], dimensionality reduction approach by calculating the Correlation-based Feature Selection,Principal Component Analysis, Kernel Principal Component Analysis, Linear Discriminant Analysis, and Stepwise Linear Discriminant Analysis [9, 14, 20], among others. However, for weed discrimination, feature selection methods still have challenges eliminating redundant information without affecting performance [10, 19].

In this study, we introduce a sparse-based feature selection approach based on Lasso operator that eliminates redundant information on the Crop/Weed Field Image Dataset (CWFID), aiming of estimating the optimum regularization value that maximizes the accuracy and minimizes feature dimension. As a result, we improve performance and interpretability in weed/crop discrimination tasks.

2 Materials and Methods

2.1 Lasso Sparse Regression Model

The LASSO sparse regression is involved in feature extraction as follows [17]:

$$\begin{aligned} \mathbf u =\arg _\mathbf{u }min \frac{1}{2}\left\| {\mathbf {Gu}}-\mathbf y \right\| _{2}^{2}+\lambda \left\| \mathbf u \right\| _{1} \end{aligned}$$

(1)

where $\Vert . \Vert _{1}$ denotes the $l_{1}$-norm, $\mathbf y \in \mathbb {R}^{N}$ is a vector containing class labels $\{1,2\}$. To optimize $\mathbf u $ in Eq. (1), the coordinate descent algorithm is adopted as [4]:

$$\begin{aligned} \widetilde{u}_{j}\leftarrow S\left( \sum ^{N}_{i=1} g_{i,j}\left( y_{i}-\widetilde{y}_{i}^{(j)} \right) , \lambda \right) \end{aligned}$$

(2)

where $\widetilde{y}_{i}^{(j)} = \sum _{d\ne j }\left( g_{i,d}\widetilde{u}_{d} \right) $ is the fitted value excluding the contribution from $g_{i,j}$, and $S(a,\lambda )$ is a shrinkage-thresholding operator defined as below:

$$\begin{aligned} \text {sign}(a)(\left| a \right| -\lambda )_{+}=\left\{ \begin{matrix} a-\lambda &{} if &{} a>\lambda \\ 0 &{} if &{} \left| a \right| \leqslant \lambda \\ a+\lambda &{} if &{} a<-\lambda \end{matrix}\right. \end{aligned}$$

(3)

So, we calculate the optimized sparse vector $\widetilde{\mathbf{u }}$, satisfying Eq. (1) by repeating the update until convergence. The column vectors in $\mathbf G $ are those zero-entries in $\widetilde{\mathbf{u }}$, which are excluded from an optimized feature set $\widetilde{\mathbf{G }}$ under assumption that it has lower dimensionality than $\mathbf G $. The real-valued $\lambda $ determines the sparsity degree of $\widetilde{u}$, and hence it rules the selection of HOG feature sets.

2.2 Feature Selection Algorithm

Since any projection matrix $\mathbf {A} \in \mathbb {R}^{P \times M}$ encode a linear combination of features in each row, we can estimate the relevance of each feature $\mathbf {\varrho } \in \mathbb {R}^{P}$ as follows:

$$\begin{aligned} \varrho _p = \sum _{m=1}^{M} \left| a_{m} \right| ; \forall p \in P \end{aligned}$$

(4)

Where $a_{m} \in \mathbb {R}^{P} $ is the m-th row in $\mathbf {A}$ and largest values of $\varrho _p$ must point out to better input attributes since they exhibit higher overall dependencies. As a result, the calculated relevance vector $\varrho $ can be employed to rank the original features. Furthermore, aiming to estimate a representation space encoding discriminant input patterns, we compute the matrix $\mathbf {X}_S \in \mathbb {R}^{N \times P_s}$, where $ P_s \leqslant P$ holding the features that satisfy the condition $\bar{\varrho }_p \le \zeta $, where $\bar{\varrho }_p$ is the normalized feature relevance vector index, and it is calculated as follows:

$$\begin{aligned} \bar{\varrho }_p=\frac{\varrho _p}{ \sum \varrho _p} \end{aligned}$$

(5)

3 Experimental Set-Up

Dataset and Preprocessing: We evaluate different approaches on the Crop/Weed Field Image Dataset (CWFID)^{Footnote 1}, holding 162 crop and 332 weed plants. Items had been labeled from 60 top-down field images of organic carrots, in the leaf development growth stage, with the appearance of intra-row and close-to-crop weeds. The dataset was acquired at a commercial organic carrot farm just before manual weed control. Also, the dataset included a soil mask for each image.

In the preprocessing stage, we extract all crop and weed plants as individual images, as seen in Fig. 1. Moreover, since our goal is the crop/weed discrimination task, we use the soil mask of each image to ensure only vegetation information. Furthermore, we set the size of all individual images to the one with the hugest size. Using the HOG descriptor at different window parameter values $\omega = \{32,44,56,68,80\}$, we extract relevant morphological information at different scales for each crop and weed plants.

Evaluation Scheme and Performance Assessment: We propose a sparse-based dimensional reduction approach using the well-known Lasso operator, achieving the selection of significant features on a nested cross-validation framework, for which a grid of $\lambda $ and $\omega $ parameters are fixed to find the ones that select fewer features with better performance. Afterward, for the optimum $\lambda $ and $\omega $ values, we calculate the average relevant feature occurrence per trial, yielding the average of the most significant features that feed the classifier based on an incremental learning approach.

For the sake of comparison, we contrast our method with no feature selection (NFS), Principal Component Analysis Selector version (PCA_sel) and centered kernel alignment selector version (CKA_sel), where, we obtain selector versions as shown in Sect. 2.2, also, we find the best $\zeta $ using a grid of values between 0 and 1. Moreover, we use a kernelized support vector machine (KVM), for which the $\epsilon $ value is tuned via an exhaustive search, and the Gaussian kernel bandwidth is selected as in [1]. Then, we estimate the average accuracy ($a_{cc}$) and F1-score ($F1_s$) by a nested ten-fold cross-validation scheme as a measure of classifier performance. Where, $F1_s$ is an unbalance binary classification accuracy measure (see Eq. (6)), that considers both the precision and the recall.

$$\begin{aligned} F1_s = 2\frac{precision \cdot recall}{precision+ recall} \end{aligned}$$

(6)

4 Results and Discussion

Figure 2 depicts the KSVM classification accuracy using different values of window size $(\omega )$, showing that the proposed method reach the maximum accuracy score ($76.47 \pm 5.52$) using $\omega = 56$.

As seen in Fig. 3, large values of $\lambda $ may exclude several features, while small values do not eliminate redundancy effectively. For reaching a trade-off between both limiting cases, the optimal regularization parameter value is determined by a nested ten-fold cross-validation framework on training data, resulting in $\lambda = 0.022$. Consequently, the proposed dimension reduction approach enhances the crop/weed discrimination task by performing the minimization of the LASSO regression model, for which the regularization parameter $\lambda $ rules the feature selection effectiveness.

Another aspect to consider is the enhanced interpretability that allows identifying the set of relevant HOG features, having a meaningful understanding. For better representation, Fig. 4 depicts the sorted relative feature relevance together with the average per spectral band for the optimum $\lambda $ value. Note that features extracted from the green band are more relevant due to plants reflect with higher intensity the green spectrum.

Besides improved interpretability, the contribution for the relevant can be assessed as seen in Fig. 5 that depicts performed accuracy by using the incremental learning approach by feeding stepwise the classifier with the feature set ranked in decreasing order or relevance. Adjusting $\lambda = 0.022$, therefore, the first ranked 134 features perform the higher accuracy value 91.52 with relatively high confidence (standard deviation of 5.11). Afterwards, by adding more features, the accuracy tends to decrease since either redundant or noisy information is added.

Lastly, Fig. 6 depicts the KSVM classification accuracy and F1-score derived by NFS, PCA_sel, CKA_sel, and Relieff, showing that the proposed method outperforms other approaches, reaching $91.52 \pm 5.11$ and $86.45 \pm 8.12$ in average accuracy and F1-score, respectively.

5 Concluding Remark

We develop a sparse-based feature selection approach based on Lasso operator that eliminates redundant information using field images. By optimizing the regularization value, we handle the trade-off between two limiting conditions: accuracy maximization versus dimensional feature reduction. As a result, our proposal selects effectively relevant features that avoid redundant information, improving both accuracy and F1-score. Besides, we improve the performance and interpretability in weed/crop discrimination tasks.

For future work, we plan to use deep learning, the one that uses convolutional neural network strategies, aiming to obtain more relevant features from the images, and thus improving weed/crop discrimination. Also, multiple learning kernel approaches can be explored to merge the information, resulting in a more relevant kernel that enhances weed/crop classification.

Notes

1.
Available at http://github.com/cwfid.

References

Álvarez-Meza, A.M., Cárdenas-Peña, D., Castellanos-Dominguez, G.: Unsupervised kernel function building using maximization of information potential variability. In: Bayro-Corrochano, E., Hancock, E. (eds.) CIARP 2014. LNCS, vol. 8827, pp. 335–342. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12568-8_41
Chapter Google Scholar
Duro, D.C., Franklin, S.E., et al.: Multi-scale object-based image analysis and feature selection of multi-sensor earth observation imagery using random forests. Int. J. Remote Sens. 33(14), 4502–4526 (2012)
Article Google Scholar
Fernández-Quintanilla, C., Peña, J., et al.: Is the current state of the art of weed monitoring suitable for site-specific weed management in arable crops? Weed Res. 58(4), 259–272 (2018)
Article Google Scholar
Friedman, J., Hastie, T., et al.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)
Article Google Scholar
Hamuda, E., Mc Ginley, B., et al.: Automatic crop detection under field conditions using the HSV colour space and morphological operations. Comput. Electron. Agric. 133, 97–107 (2017)
Article Google Scholar
Hamuda, E., Mc Ginley, B., et al.: Improved image processing-based crop detection using Kalman filtering and the Hungarian algorithm. Comput. Electron. Agric. 148, 37–44 (2018)
Article Google Scholar
Han, J., Pei, J., et al.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
MATH Google Scholar
Huang, H., Lan, Y., et al.: A semantic labeling approach for accurate weed mapping of high resolution UAV imagery. Sensors 18(7), 2113 (2018)
Article Google Scholar
Ma, L., Cheng, L., et al.: Training set size, scale, and features in geographic object-based image analysis of very high resolution unmanned aerial vehicle imagery. ISPRS J. Photogramm. Remote Sens. 102, 14–27 (2015)
Article Google Scholar
Ma, L., Fu, T., et al.: Evaluation of feature selection methods for object-based land cover mapping of unmanned aerial vehicle imagery using random forest and support vector machine classifiers. ISPRS Int. J. Geo-Inf. 6(2), 51 (2017)
Article Google Scholar
Peña-Barragán, J.M., Ngugi, M.K., et al.: Object-based crop identification using multiple vegetation indices, textural features and crop phenology. Remote Sens. Environ. 115(6), 1301–1316 (2011)
Article Google Scholar
Perez-Sanz, F., Navarro, P.J., et al.: Plant phenomics: an overview of image acquisition technologies and image data analysis algorithms. GigaScience 6(11), gix092 (2017)
Article Google Scholar
Sankaran, S., Khot, L.R., et al.: Low-altitude, high-resolution aerial imaging systems for row and field crop phenotyping: a review. Eur. J. Agron. 70, 112–123 (2015)
Article Google Scholar
Siddiqi, M.H., Lee, S.-W., et al.: Weed image classification using wavelet transform, stepwise linear discriminant analysis, and support vector machines for an automatic spray control system. J. Inf. Sci. Eng. 30(4), 1227–1244 (2014)
Google Scholar
Singh, A., Ganapathysubramanian, B., et al.: Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci. 21(2), 110–124 (2016)
Article Google Scholar
Stumpf, A., Kerle, N.: Object-oriented mapping of landslides using random forests. Remote Sens. Environ. 115(10), 2564–2577 (2011)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Vieira, M.A., Formaggio, A.R., et al.: Object based image analysis and data mining applied to a remotely sensed landsat time-series to map sugarcane over large areas. Remote Sens. Environ. 123, 553–562 (2012)
Article Google Scholar
Wang, A., Zhang, W., et al.: A review on weed detection using ground-based machine vision and image processing techniques. Comput. Electron. Agric. 158, 226–240 (2019)
Article Google Scholar
Weis, M., Sökefeld, M.: Detection and identification of weeds. In: Oerke, E.C., Gerhards, R., Menz, G., Sikora, R. (eds.) Precision Crop Protection - The Challenge and Use of Heterogeneity, pp. 119–134. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-9277-9_8
Chapter Google Scholar

Download references

Acknowledgment

“This work was developed under the research project CARACTERIZACIÓN DE CULTIVOS AGRÍCOLAS MEDIANTE ESTRATEGIAS DE TELEDETECCIÓN Y TÉCNICAS DE PROCESAMIENTO DE IMÁGENES (Hermes-36719) funded by Universidad Nacional de Colombia”.

Author information

Authors and Affiliations

Signal Processing and Recognition Group, Universidad Nacional de Colombia, Manizales, Colombia
Daniel Guillermo García-Murillo, Andrés M. Álvarez & German Castellanos-Dominguez
Automatic Research Group, Universidad Tecnológica de Pereira, Pereira, Colombia
David Cárdenas-Peña
Physicochemistry of Terrestrial Fluids Group, Universidad de Caldas, Manizales, Colombia
William Hincapie-Restrepo

Authors

Daniel Guillermo García-Murillo
View author publications
You can also search for this author in PubMed Google Scholar
Andrés M. Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
David Cárdenas-Peña
View author publications
You can also search for this author in PubMed Google Scholar
William Hincapie-Restrepo
View author publications
You can also search for this author in PubMed Google Scholar
German Castellanos-Dominguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Guillermo García-Murillo .

Editor information

Editors and Affiliations

Uppsala University, Uppsala, Sweden
Ingela Nyström
University of Information Science, Havana, Cuba
Yanio Hernández Heredia
University of Information Science, Havana, Cuba
Vladimir Milián Núñez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García-Murillo, D.G., Álvarez, A.M., Cárdenas-Peña, D., Hincapie-Restrepo, W., Castellanos-Dominguez, G. (2019). Sparse-Based Feature Selection for Discriminating Between Crops and Weeds Using Field Images. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2019. Lecture Notes in Computer Science(), vol 11896. Springer, Cham. https://doi.org/10.1007/978-3-030-33904-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-33904-3_33
Published: 22 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33903-6
Online ISBN: 978-3-030-33904-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)