Keywords

1 Introduction

The global population grows a rate of \(1.09\%\) per year, increasing the demands for food and requiring higher agricultural yields [13]. However, one of the multiple challenges that has a marked correlation with crop yield loss is the management of weeds [15], i.e., the unwanted plants grow randomly in all the field, competing with crops for resources (like water, nutrients, and sunlight) [5].

To deal with the weed spreading, several control strategies have been proposed for several crops and scale fields [19]. In small scale fields, manual weed control is still being used even that it is a tedious, inefficient, and high labor-cost task. In turn, smart agriculture, including remote sensing and mechanical weed management methods for the tillage of the soil, arises as a better control approach as they are more productive and labor-saving [8]. Nevertheless, mechanical management will cause crop damage if the discrimination of intra-row weeds is not accurate enough [6].

Generally, traditional approaches devoted to weed discrimination using remote sensing include the following stages [3]: (i) feature estimation, (ii) feature extraction and selection, and (iii) classification. For the first stage, the Histogram of Oriented Gradients (HOG) is one of the principal scale-invariant feature descriptors used that allocate shape information of regions. However, the selection of relevant features is determinant to obtain interpretable and accurate results reducing computational complexity by removing redundant information concerning the task studied [12]. Among the apporaches to feature selection are attribute measures using decision trees [7], iterative backward elimination by a feature ranking obtained from the Random Forest algorithm [16], decision rule using the chi-square statistic metrics [11], measures based on Random Forest [2], measures of highest normalized information gain [18], dimensionality reduction approach by calculating the Correlation-based Feature Selection,Principal Component Analysis, Kernel Principal Component Analysis, Linear Discriminant Analysis, and Stepwise Linear Discriminant Analysis [9, 14, 20], among others. However, for weed discrimination, feature selection methods still have challenges eliminating redundant information without affecting performance [10, 19].

In this study, we introduce a sparse-based feature selection approach based on Lasso operator that eliminates redundant information on the Crop/Weed Field Image Dataset (CWFID), aiming of estimating the optimum regularization value that maximizes the accuracy and minimizes feature dimension. As a result, we improve performance and interpretability in weed/crop discrimination tasks.

2 Materials and Methods

2.1 Lasso Sparse Regression Model

The LASSO sparse regression is involved in feature extraction as follows [17]:

$$\begin{aligned} \mathbf u =\arg _\mathbf{u }min \frac{1}{2}\left\| {\mathbf {Gu}}-\mathbf y \right\| _{2}^{2}+\lambda \left\| \mathbf u \right\| _{1} \end{aligned}$$
(1)

where \(\Vert . \Vert _{1}\) denotes the \(l_{1}\)-norm, \(\mathbf y \in \mathbb {R}^{N}\) is a vector containing class labels \(\{1,2\}\). To optimize \(\mathbf u \) in Eq. (1), the coordinate descent algorithm is adopted as [4]:

$$\begin{aligned} \widetilde{u}_{j}\leftarrow S\left( \sum ^{N}_{i=1} g_{i,j}\left( y_{i}-\widetilde{y}_{i}^{(j)} \right) , \lambda \right) \end{aligned}$$
(2)

where \(\widetilde{y}_{i}^{(j)} = \sum _{d\ne j }\left( g_{i,d}\widetilde{u}_{d} \right) \) is the fitted value excluding the contribution from \(g_{i,j}\), and \(S(a,\lambda )\) is a shrinkage-thresholding operator defined as below:

$$\begin{aligned} \text {sign}(a)(\left| a \right| -\lambda )_{+}=\left\{ \begin{matrix} a-\lambda &{} if &{} a>\lambda \\ 0 &{} if &{} \left| a \right| \leqslant \lambda \\ a+\lambda &{} if &{} a<-\lambda \end{matrix}\right. \end{aligned}$$
(3)

So, we calculate the optimized sparse vector \(\widetilde{\mathbf{u }}\), satisfying Eq. (1) by repeating the update until convergence. The column vectors in \(\mathbf G \) are those zero-entries in \(\widetilde{\mathbf{u }}\), which are excluded from an optimized feature set \(\widetilde{\mathbf{G }}\) under assumption that it has lower dimensionality than \(\mathbf G \). The real-valued \(\lambda \) determines the sparsity degree of \(\widetilde{u}\), and hence it rules the selection of HOG feature sets.

2.2 Feature Selection Algorithm

Since any projection matrix \(\mathbf {A} \in \mathbb {R}^{P \times M}\) encode a linear combination of features in each row, we can estimate the relevance of each feature \(\mathbf {\varrho } \in \mathbb {R}^{P}\) as follows:

$$\begin{aligned} \varrho _p = \sum _{m=1}^{M} \left| a_{m} \right| ; \forall p \in P \end{aligned}$$
(4)

Where \(a_{m} \in \mathbb {R}^{P} \) is the m-th row in \(\mathbf {A}\) and largest values of \(\varrho _p\) must point out to better input attributes since they exhibit higher overall dependencies. As a result, the calculated relevance vector \(\varrho \) can be employed to rank the original features. Furthermore, aiming to estimate a representation space encoding discriminant input patterns, we compute the matrix \(\mathbf {X}_S \in \mathbb {R}^{N \times P_s}\), where \( P_s \leqslant P\) holding the features that satisfy the condition \(\bar{\varrho }_p \le \zeta \), where \(\bar{\varrho }_p\) is the normalized feature relevance vector index, and it is calculated as follows:

$$\begin{aligned} \bar{\varrho }_p=\frac{\varrho _p}{ \sum \varrho _p} \end{aligned}$$
(5)

3 Experimental Set-Up

Dataset and Preprocessing: We evaluate different approaches on the Crop/Weed Field Image Dataset (CWFID)Footnote 1, holding 162 crop and 332 weed plants. Items had been labeled from 60 top-down field images of organic carrots, in the leaf development growth stage, with the appearance of intra-row and close-to-crop weeds. The dataset was acquired at a commercial organic carrot farm just before manual weed control. Also, the dataset included a soil mask for each image.

Fig. 1.
figure 1

Exemplary of items in the CWFID Dataset: weeds are remarked by red rectangles and crops by green rectangles. (Color figure online)

In the preprocessing stage, we extract all crop and weed plants as individual images, as seen in Fig. 1. Moreover, since our goal is the crop/weed discrimination task, we use the soil mask of each image to ensure only vegetation information. Furthermore, we set the size of all individual images to the one with the hugest size. Using the HOG descriptor at different window parameter values \(\omega = \{32,44,56,68,80\}\), we extract relevant morphological information at different scales for each crop and weed plants.

Evaluation Scheme and Performance Assessment: We propose a sparse-based dimensional reduction approach using the well-known Lasso operator, achieving the selection of significant features on a nested cross-validation framework, for which a grid of \(\lambda \) and \(\omega \) parameters are fixed to find the ones that select fewer features with better performance. Afterward, for the optimum \(\lambda \) and \(\omega \) values, we calculate the average relevant feature occurrence per trial, yielding the average of the most significant features that feed the classifier based on an incremental learning approach.

For the sake of comparison, we contrast our method with no feature selection (NFS), Principal Component Analysis Selector version (PCA_sel) and centered kernel alignment selector version (CKA_sel), where, we obtain selector versions as shown in Sect. 2.2, also, we find the best \(\zeta \) using a grid of values between 0 and 1. Moreover, we use a kernelized support vector machine (KVM), for which the \(\epsilon \) value is tuned via an exhaustive search, and the Gaussian kernel bandwidth is selected as in [1]. Then, we estimate the average accuracy (\(a_{cc}\)) and F1-score (\(F1_s\)) by a nested ten-fold cross-validation scheme as a measure of classifier performance. Where, \(F1_s\) is an unbalance binary classification accuracy measure (see Eq. (6)), that considers both the precision and the recall.

$$\begin{aligned} F1_s = 2\frac{precision \cdot recall}{precision+ recall} \end{aligned}$$
(6)
Fig. 2.
figure 2

KSVM accuracy for different \(\omega \) values

4 Results and Discussion

Figure 2 depicts the KSVM classification accuracy using different values of window size \((\omega )\), showing that the proposed method reach the maximum accuracy score (\(76.47 \pm 5.52\)) using \(\omega = 56\).

Fig. 3.
figure 3

KSVM accuracy for different \(\lambda \) values

As seen in Fig. 3, large values of \(\lambda \) may exclude several features, while small values do not eliminate redundancy effectively. For reaching a trade-off between both limiting cases, the optimal regularization parameter value is determined by a nested ten-fold cross-validation framework on training data, resulting in \(\lambda = 0.022\). Consequently, the proposed dimension reduction approach enhances the crop/weed discrimination task by performing the minimization of the LASSO regression model, for which the regularization parameter \(\lambda \) rules the feature selection effectiveness.

Fig. 4.
figure 4

Estimated relative feature relevance (Color figure online)

Another aspect to consider is the enhanced interpretability that allows identifying the set of relevant HOG features, having a meaningful understanding. For better representation, Fig. 4 depicts the sorted relative feature relevance together with the average per spectral band for the optimum \(\lambda \) value. Note that features extracted from the green band are more relevant due to plants reflect with higher intensity the green spectrum.

Fig. 5.
figure 5

Lasso incremental learning

Besides improved interpretability, the contribution for the relevant can be assessed as seen in Fig. 5 that depicts performed accuracy by using the incremental learning approach by feeding stepwise the classifier with the feature set ranked in decreasing order or relevance. Adjusting \(\lambda = 0.022\), therefore, the first ranked 134 features perform the higher accuracy value 91.52 with relatively high confidence (standard deviation of 5.11). Afterwards, by adding more features, the accuracy tends to decrease since either redundant or noisy information is added.

Fig. 6.
figure 6

Accuracy performance assessed by the SVM classifier

Lastly, Fig. 6 depicts the KSVM classification accuracy and F1-score derived by NFS, PCA_sel, CKA_sel, and Relieff, showing that the proposed method outperforms other approaches, reaching \(91.52 \pm 5.11\) and \(86.45 \pm 8.12\) in average accuracy and F1-score, respectively.

5 Concluding Remark

We develop a sparse-based feature selection approach based on Lasso operator that eliminates redundant information using field images. By optimizing the regularization value, we handle the trade-off between two limiting conditions: accuracy maximization versus dimensional feature reduction. As a result, our proposal selects effectively relevant features that avoid redundant information, improving both accuracy and F1-score. Besides, we improve the performance and interpretability in weed/crop discrimination tasks.

For future work, we plan to use deep learning, the one that uses convolutional neural network strategies, aiming to obtain more relevant features from the images, and thus improving weed/crop discrimination. Also, multiple learning kernel approaches can be explored to merge the information, resulting in a more relevant kernel that enhances weed/crop classification.