Nothing Special   »   [go: up one dir, main page]

skip to main content
survey
Open access

A Survey of Computer Vision Technologies in Urban and Controlled-environment Agriculture

Published: 27 November 2023 Publication History

Abstract

In the evolution of agriculture to its next stage, Agriculture 5.0, artificial intelligence will play a central role. Controlled-environment agriculture, or CEA, is a special form of urban and suburban agricultural practice that offers numerous economic, environmental, and social benefits, including shorter transportation routes to population centers, reduced environmental impact, and increased productivity. Due to its ability to control environmental factors, CEA couples well with computer vision (CV) in the adoption of real-time monitoring of the plant conditions and autonomous cultivation and harvesting. The objective of this article is to familiarize CV researchers with agricultural applications and agricultural practitioners with the solutions offered by CV. We identify five major CV applications in CEA, analyze their requirements and motivation, and survey the state-of-the-art as reflected in 68 technical papers using deep learning methods. In addition, we discuss five key subareas of computer vision and how they related to these CEA problems, as well as 14 vision-based CEA datasets. We hope the survey will help researchers quickly gain a bird’s-eye view of the striving research area and will spark inspiration for new research and development.

1 Introduction

Artificial intelligence (AI), especially computer vision (CV), is finding an ever-broadening range of applications in modern agriculture. The next stage of agricultural technological development, Agriculture 5.0 [15, 100, 232, 352], will constitute AI-driven autonomous decision-making as a central component. The term Agriculture 5.0 stems from a chronology [352] that begins with Agriculture 1.0, which heavily depends on human labor and animal power, and Agriculture 2.0, enabled by synthetic fertilizers, pesticide, and combustion-powered machinery, and develops to Agriculture 3.0 and 4.0, characterized by GPS-enabled precision control, and Internet-of-Things (IoT)-driven data collection [250]. Built upon the rich agricultural data collected, Agriculture 5.0 holds the promise to further increase productivity, satiate the food demand of a growing global population, and mitigate the negative environmental impact of existing agricultural practices.
As an integral component of Agriculture 5.0, controlled-environment agriculture (CEA), a farming practice carried out within urban, indoor, resource-controlled, and sensor-driven factories, is particularly suitable for the application of AI and CV. This is because CEA provides ample infrastructure support for data collection and autonomous execution of algorithmic decisions. In terms of productivity, CEA could produce higher yield per unit area of land [8, 9] and boost the nutritional content of agricultural products [159, 304]. In terms of environmental impact, CEA farms can insulate environmental influences, relieve the need for fertilizer and pesticides, and efficiently utilize recycled resources like water, thereby may be much more environmentally friendly and self-sustainable than traditional farming.
In light of current global challenges, such as disruptions to global supply chains and the threat of climate change, CEA appears especially appealing as a food source for urban population centers. Under pressures of deglobalization brought by geopolitical tensions [362] and global pandemics [233, 268], CEA provides the possibility to build farms close to large cities, which shortens the transportation distance and maintains secure food supplies even when long-distance routes are disrupted. The city-state Singapore, for example, has promised to source 30% of its food domestically by 2030 [1, 306], which is only possible through suburban farms such as CEAs. Furthermore, CEA, as a form of precision agriculture, is by itself a viable solution to the reduction of the emission of greenhouse gasses [9, 37, 243]. CEA can also shield plants from adverse climate conditions exacerbated by climate change, as its environments are fully controlled [112], and is able to effectively reuse the arable land eroded due to climate change [364].
We argue that AI and CV are critical to the economic viability and long-term sustainability of CEAs, as these technologies could save expenses associated with production and improve productivity. Suburban CEAs have high land costs. An analysis in Victoria, Australia [38], shows that, due to the higher land cost resulting from proximity to cities, with an estimated 50-fold productivity improvement per land area, it still takes six to seven years for a CEA to reach the break-even point. Thus, further productivity improvement from AI would act as strong drivers for CEA adoption. Moreover, vertical or stacked setup of vertical farms impose additional difficulty for farmers to perform daily surveillance and operations. Automated solutions empowered by computer vision could effectively solve this problem. Finally, AI and CV technologies have the potential to fully characterize the complex, individually different, time-varying, and dynamic conditions of living organisms [39], which will enable precise and individualized management and further elevate yield. Thus, AI and CV technologies appear to be a natural fit to CEAs.
Most of the recent development of AI can be attributed to the newly discovered capability to train deep neural networks [172] that can (1) automatically learn multi-level representations of input data that are transferable to diverse downstream tasks [65, 136], (2) easily scale up to match the growing size of data [283], and (3) conveniently utilize massively parallel hardware architectures like GPUs [114, 328]. As function approximators, deep learning proves to be surprisingly effective in generalizing to previously unseen data [354]. Deep learning has achieved tremendous success in computer vision [293], natural language processing [47, 83, 118], multimedia [23, 88], robotics [291], game playing [270], and many other areas.
The AI revolution in agriculture is already underway. State-of-the-art neural network technologies, such as ResNet [134] and MobileNet [138] for image recognition, and Faster R-CNN [239], Mask R-CNN [133], and YOLO [235] for object detection, have been applied to the management of crops [194], livestock [140, 299], and plants in indoor and vertical farms [240, 357]. AI has been used to provide decision support in myriad tasks, from DNA analysis [194] and growth monitoring [240, 357] to disease detection [254] and profit prediction [28].
While several surveys have explored the use of CV techniques in agriculture, none of them specifically focus on CEA applications. Some surveys summarize studies based on aspects of practical applications in agriculture. References [74, 89, 123, 146, 278] survey pest and disease detection studies. References [40, 111, 303] discuss fruit and vegetable quality grading and disease detection. Reference [298] summarizes studies in six sub-fields, including crop growth monitoring, pest and disease detection, automatic harvesting/fruit detection, fruit quality testing, automated management of modern farms, and the monitoring of farmland information with Unmanned Aerial Vehicle (UAV). Other surveys organize existing works from a technical perspective, namely, algorithms used [237] or formats of data [56]. Reference [151], as an exception, introduces the development history of CV and AI in smart agriculture without investigating any individual studies. Our work aims to address this gap and provide insights tailored to CEA-specific contexts.
As the volume of research in smart agriculture grows rapidly, we hope the current review article can bridge researchers from both areas of AI and agriculture and create a mild learning curve when they wish to familiarize themselves in the other area. We believe computer vision has the closest connections with, and is the most immediately applicable in, urban agriculture and CEAs. Hence, in this article, we focus on reviewing deep-learning-based computer vision technologies in urban farming and CEAs. We focus on deep learning, because it is the predominant approach in AI and CV research. The contributions of this article are two-fold, with the former targeted at AI researchers and the latter targeted at agriculture researchers:
We identify five major CV applications in CEA and analyze their requirements and motivation. Further, we survey the state-of-the-art as reflected in 68 technical papers and 14 vision-based CEA datasets.
We discuss five key subareas of computer vision and how they relate to CEA. In addition, we identify four potential future directions for research in CV for CEA.
In Figure 1, we provide an graphical preview of our content. It illustrates the end-to-end agriculture process of CEAs, from seed planting to harvest and sales, with five major deep learning-based CV applications—Growth Monitoring, Fruit and Flower Detection, Fruit Counting, Maturity Level Classification, and Pest and Disease Detection—mapped to the corresponding applicable plant growth stages. We do not survey the autonomous seed planting and harvesting step, as they are more relevant to robot functioning and robotic control, i.e., grasping, carrying, and placing of objects rather than computer vision (we do include the localization of fruit in the fruit and flower detection section that facilitates harvesting robot to locate the targeted object and perform action). However, we provide here some literature related to agriculture robot and end-effector design for reference: [36, 57, 92, 231, 353].
Fig. 1.
Fig. 1. An illustration of the end-to-end agriculture process of CEAs, from seed planting to harvest and sales, with five major deep learning-based CV in agriculture applications—Growth Monitoring, Fruit and Flower Detection, Fruit Counting, Maturity Level Classification, and Pest and Disease Detection—mapped to the corresponding applicable plant growth stages. Autonomous Seed Sowing and Autonomous Harvest and Sales in gray boxes are relevant steps in the agriculture process of CEAs but are out of the scope of our survey, which focuses on CV in CEAs. Orange lines represent arrows originated from pest and disease detection. Green lines represent arrows with stage 4 as destination.
We structure the survey following the process in the figure: First, to provide a bird’s-eye view of CV capabilities available to researchers in smart agriculture, we summarize several major CV problems and influential technical solutions in Section 2. Next, we review 68 papers with respect to the application of computer vision in the CEA system in Section 3. The discussion is organized into five subsections: Growth Monitoring, Fruit and Flower Detection, Fruit Counting, Maturity Level Classification, and Pest and Disease Detection. In the discussion, we focus on fruits and vegetables that are suitable for CEA, including tomato [10, 13, 127, 351], mango [7], guava [269, 333], strawberry [107, 346], capsicum [174], banana [5], lettuce [359], cucumber [10, 128, 200], citrus [4], and blueberry [2]. Next, we provide a summary of 14 publicly available datasets of plants and fruits in Section 4 to facilitate future studies in controlled-environment agriculture. Finally, we highlight a few research directions that could generate high-impact research in the near future in Section 5.
One thing to note here is that, except for the Leaf Instance Segmentation task under the Growth Monitoring section, all the tasks are performed with model trained from different datasets and evaluated on different metrics. Tables 3 4, 5, 6 showcase the variety in datasets and evaluation metrics. This variation results in incomparable performance between studies. Such a phenomenon further indicates the necessity of our survey, which summarizes the current progress in literature and encourages the development of general benchmarks to promote consistency and comparability in future research.
Table 1.
FactorsCV ProblemsExample Countermeasures
Environmental ChangeOOD GeneralizationCollect New Data, Few-shot learning, Weakly-supervised learning, Unsupervised-learning (see Section 3)
Sub-optimal Data QualityUnbalanced Class Distribution, Label NoiseMultiple-instance Learning, Generate Image of Minority Classes with GANs, Few-shot Meta-learning (see Sections 3.5.2, 3.5.3, and 5.1)
Human FactorInterpretability, Uncertainty EstimatesPaired Confidence Scores, Meta-learning (see Sections 2.4, 2.5, and 5.2)
Table 1. Factors to Consider when Applying CV Techniques in CEA and Some Corresponding Countermeasures
Table 2.
CategoryTechniqueSBD (\(\uparrow\))|DiC| (\(\downarrow\))
SequentialEnd-to-end instance segmentation [238]84.90.8
RNN-SIS [251]74.71.1
 RIS [245]66.61.1
Pixel EmbeddingSemantic Instance Segmentation [80]84.21.0
Object-aware Embedding [62]83.10.73
RHN + Cosine Embeddings [221]84.51.5
Crop Leaf and Plant Instance Segmentation [324]91.11.8
W-Net (GT-FG) [331]91.9-
 SPOCO (GT-FG) [327]93.21.7
Table 2. Performance of Various Leaf Instance Segmentation Techniques on the CVPPP A1 Test Set
Higher SBD and lower |DiC| indicate better performance. (GT-FG) indicates model making use of ground-truth foregrounds.
Table 3.
CategoryTechniqueEvaluation MetricPerformanceDataset
FruitObject Detection[351]Precision (IoU > 0.5)94%1,730 images of cherry tomatoes
[139]Accuracy (IoU unspecified)95.50%800 images of tomatoes
[249]F1 scores (IoU unspecified)83.80%122 images of 7 fruits
[356]True positive rate and False positive rate (IoU unspecified)98%, 17%2,116 self-acquired images of fruits and 511 images of fruits from ImageNet
[346]Precision and Recall (IoU > 0.9)94.4%, 93.5%2,000 images of strawberries
[262]F1 scores (IoU unspecified)93.5%–95.1%Mango Image Dataset [157]
FruitSegmentation[185]Precision and Recall (IoU unspecified)98.3%, 94.8%437 RGB-D images of guavas
[13]Precision, Recall, and F1 scores (IoU > 0.5)96%, 91%, 93%123 images RGB-D images of tomatoes
[107]Precision, Recall, F1 score, and Average Precision (IoU > 0.9)97%, 92%, 94%, 90%120 images RGB-D images of strawberries
[347]Mean IoU89.85%1,900 images of strawberries
[141]Accuracy (IoU unspecified)98%900 images of strawberries
Flower Object Detection[199]Average Precision and F1 scores (IoU > 0.5)96.2%, 89.0%1,078 images of citrus buds and flowers
[284]Average Precision (IoU > 0.5)90.50%5,624 images of tomato flower and fruit
[285]IoU, F1 scores, Recall, and Precision (IoU unspecified)81.1%, 89.6%, 91.9%, 87.3%Multi-species fruit flower detection [85]
Table 3. Performance of Various Fruit and Flower Detection Techniques
Datasets without reference are unpublished datasets.
Table 4.
CategoryTechniqueEvaluation MetricPerformanceDataset
Count Regression[234]Accuracy91.0%–93%4,800 synthetic tomato images
Count Fruit Bounding Boxes[157]F1 scores, Average Precision(IoU > 0.24)96.8%, 98.3%MangoYolo Dataset [157]
[320]\(R^{2}\), RMSE0.66, 2.1MangoYolo Dataset [157]
Count Fruit Segmentation Masks[154]Accuracy, F1 score (IoU > 0.6)73.6%, 84.4%12,590 images of mangoes
[215]Average Precision (IoU > 0.5), RMSE71.6%, 1.484724 images of blueberries
Table 4. Performance of Various Fruit Counting Techniques
Datasets without reference are unpublished datasets. Reference [234] uses direct regression method and thus does not need IoU threshold.
Table 5.
CategoryTechniqueEvaluation MetricPerformanceDataset
Classification[357]Accuracy91.9%200 images of tomatoes
Classification on Bounding Boxes[346]Precision, Recall (IoU > 0.9)94.4%, 93.5%2,000 images of strawberries
[124]F1 score (IoU > 0.4)77.30%285 images of capsicums
Classification on Segmentation Masks[13]Precision, Recall, and F1 scores (IoU > 0.5)-123 images RGB-D images of tomatoes
[107]Precision, Recall, F1 score, and Average Precision (IoU > 0.9)-120 images RGB-D images of strawberries
[347]Precision, Recall (IoU > 0.9)95.78%, 95.41%1,900 images of strawberries
[141]Class frequency weighted precision and recall (IoU Unspecified)96.1%, 96.0%900 images of strawberries
Table 5. Performance of Various Maturity-level Classification Techniques
Datasets without reference are unpublished datasets. Performance “-” are papers with unsummarizable metric results. Reference [357] uses direct classification method and thus does not need IoU threshold.
Table 6.
CategoryTechniqueEvaluation MetricPerformanceDataset
Single- and Multi-label Classification[361]Accuracy\(^{*}\)94.65%700 diseased and normal leaf images
[272]Accuracy\(^{*}\)97.13%1,070 self-acquired leaf images, and 1,130 images from the Plant Village dataset [143]
[24]Accuracy\(^{*}\)93.33%Images of 643 leaf samples
[254]mAP\(^{*}\)72.8%–97.9%12,600 images of bananas
[95]Accuracy\(^{*}\)99.50%87,848 images of leaves
[201]Accuracy\(^{*}\)93.40%Plant Village dataset [143]
[103]mAP (IoU > 0.5)86%5,000 images of diseases and pests of tomatoes
[349]mIOU, recall, and F1-score (IoU unspecified)84.8%, 88.1%, 91.8%Plant Village dataset [143]
Handling Unbalanced Class Distribution[44]Accuracy\(^{*}\)60.7%–91.8%IP102 [330], Citrus Pest Benchmark [44]
[116]Average Precision67%–85%Plant Village dataset [143] and Plant Leaves [267]
[216]Accuracy\(^{*}\)88.5%–95.5%Plant Village dataset [143] and Plant and Pest [182]
[182]Accuracy\(^{*}\)43.9%–81%Plant Village [143], Crop Pests Recognition [181]
Noise and Uncertainty Estimate[263]Accuracy\(^{*}\)94.58%Plant Village [143]
[99]Accuracy\(^{*}\)-15,892 images of tomatoes from Plant Village [143], extra 8,911 images of corn, 6,635 images of soybeans
Table 6. Performance of Various Pest and Disease Detection Techniques
Datasets without reference are unpublished datasets. Performance “-” are papers with unsummarizable metric results. *\(^{*}\)Studies perform direct classification on image and thus do not need IoU threshold. Reference [116] uses patch-level segmentation that does not need IoU threshold as well.

2 Computer Vision Capabilities Relevant to Smart Agriculture

2.1 Image Recognition

The classic problem of image recognition is to classify an image containing a single object to the corresponding object class. The success of deep convolutional networks in this area dates (at least) back to LeNet [173] of 1998, which recognizes hand-written digits. The fundamental building block of such networks is the convolution operation. Using the principles of local connections and weight sharing, convolutional networks benefit from an inductive bias of translational invariance. That is, a convolutional network applies (approximately) the same operation to all pixel locations of the image.
The victory of AlexNet [163] in the 2012 ImageNet Large Scale Visual Recognition Challenge [247] is often considered as a landmark event that introduced deep neural networks into the AI mainstream. Subsequently, many variants of convolutional networks [148, 170, 271, 289] have been proposed. Due to space limits, here, we provide a brief review of a few influential works, which is by no means exhaustive. ResNet [134] introduces residual connections that allow the training of networks of more than 100 layers. ResNeXT [336] and MobileNet [138] employ grouped convolution that reduces interaction between channels and improves the efficiency of the network parameters. ShuffleNet [365] utilizes the shuffling of channels, which complements group convolution. EfficientNet [292] shows simultaneous scaling of the network width, height, and image resolution is key to efficient use of parameters.
Recently, the transformer model has proven to be a highly competitive architecture for image recognition and other computer vision tasks [90]. These models cut the input image into a sequence of small image patches and often apply strong regularization such as RandAugment [75]. Variants such as CaiT [302], CeiT [350], Swin Transformer [195], and others [72, 78, 337, 371] achieve outstanding performance on ImageNet.
Despite the maturity of the technology for image classification, the assumption that an image contains only one object may not be easily satisfied in real-world scenarios. Thus, it is often necessary to adopt a problem formulation as object detection or semantic/instance segmentation.

2.2 Object Detection

The object detection task is to identify and locate all objects in the image. It can be understood as the task resulted from relaxing the assumption that the input image contains a single object. This is one natural problem formulation for real-world images and has seen wide adoption in agricultural applications.
In broad strokes, contemporary object detection methods can be categorized into anchor-box-based and point-based/proposal-free approaches. In anchor-box methods [110, 239], the process starts with a number of predefined anchor boxes that are periodically tiled to cover the entire input image. For each anchor box, the network makes two types of predictions. First, it determines if the anchor box contains one of the predefined object classes. Second, if the box contains an object, then the network attempts to move and reshape the box to become closer to the ground-truth location of the object. One-stage anchor-box detectors [77, 101, 187, 193, 236, 367] make these predictions all at once. In comparison, two-stage detectors [110, 132, 186, 239], in the first stage discard anchor boxes that do not contain any object and classify the remaining boxes into finer object categories in the second stage. The location adjustment, known as bounding box regression, can happen in both stages. It is also possible to employ more than two stages [48]. When the objects have diverse shapes and scales, these methods must create a large number of proposal boxes and evaluate them all, which can lead to high computational cost.
While point-based object detectors [91, 158, 171, 300, 372] still need to identify rectangular boxes around the objects, they make predictions at the level of grid locations on the feature maps. The networks predict if a grid location is a corner or the center of an object bounding box. After that, the algorithm assembles the corners and centers into bounding boxes. The point-based approaches can reduce the total number of decisions to be made. A careful comparison and analysis of anchor-box methods and point-based methods can be found in Reference [360].

2.3 Semantic, Instance, and Panoptic Segmentation

Segmentation is a pixel-level classification task, aiming to classify every pixel in the image into a type of object or an object instance. The variations of the task differ by their definitions of the classes. In semantic segmentation [73, 94, 113, 167, 196], each type of object, such as cat, cow, grass, or sky, is its own class, but different instances of the same object type (e.g., two cats) share the same class. In instance segmentation [76, 129, 131, 226], different instances of the same object type become unique classes, so two cats are no longer the same class. However, object types such as sky or grass, which are not easily divided into instances, are ignored. In the recently proposed panoptic segmentation [69, 109, 155, 178, 189, 363], objects are first separated into things and stuff. Things are countable and each instance of things is its own class, whereas stuff is uncountable, impossible to separate into instances, appearing as texture or amorphous regions [12], and remains as one class. We note that the distinction between things and stuff is not rigid and can change, depending on the application. For example, grass is typically considered as stuff, but in the leaf instance segmentation task, each leaf of a plant becomes an instance and is a separate class.
The primary requirement of pixel-level classification is to learn pixel-level representations that consider sufficient context and within reasonable computational budget. A typical solution is to introduce a series of downsampling followed by a series of upsampling operations. Since classic works such as the Fully Convolutional Network (FCN) [196] and U-Net [246], this has been the mainstream strategy for various segmentation strategies.
Due to its use in leaf segmentation, a problem in plant phenotyping, instance segmentation may be the most relevant segmentation formulation for urban farming. Despite the apparent similarity to semantic segmentation, instance segmentation poses challenges due to the variable number of instance classes and possible permutation of class indices [80]. This could be handled by combining proposal-based object detection and segmentation [61, 68, 129, 180, 227]. Mask-RCNN [132] exemplifies this approach. Leveraging its object detection capability, the network associates each object with a bounding box. After that, the network predicts a binary mask for the object within the bounding box. However, such methods may not perform well when there is substantial occlusion among objects or when objects are of irregular shapes [80].
Departing from the detect-then-segment paradigm, recurrent methods [238, 245, 251] that output one segmentation mask at one time may be considered as implicitly modeling occlusion. Pixel embedding methods [62, 80, 213, 221, 326, 331, 344] learn vector representations for every pixel and cluster the vectors. These methods are especially suitable for segmenting plant leaves, and we will discuss them in greater detail in Section 3.1. Taking a page from the proposal-free object detector YOLO [235], SOLO [316] and SOLOv2 [317] divide the image into grids. The grid that the center an object falls into is responsible for predicting the segmentation mask of the object.

2.4 Uncertainty Quantification

Real-world applications often require qualification of the amount of uncertainty in the predictions made by machine learning, especially when the predictions carry serious implications. For example, if the system incorrectly determines that fruits are not mature enough, then it may delay harvesting and cause overripe fruits with diminished values. Thus, users of the ML system are justified to ask how certain we are about the decision. In addition, when facing real-world input, it is desirable for the network to answer “I don’t know” when facing an input that it does not recognize [183]. Well-calibrated uncertainty measurements may enable such a capability.
However, research shows that deep neural networks exhibit severe vulnerability to overconfidence, or under-estimation of the uncertainty in its own decisions [117, 203]. That is, the accuracy of the network decision is frequently lower than the probability that the network assigns to the decision. As a result, proper calibration of the networks should be a concern for systems built for real-world applications.
Calibration of deep neural networks may be performed post hoc (after training) using temperature scaling and histogram binning [87, 117, 312]. Also, regularization during training such as label smoothing [289] and mixup [137] have been shown to improve calibration [211, 224, 297]. Researchers propose new loss functions to replace existing ones that are susceptible to overconfidence [210, 343]. Moreover, ensemble methods such as Vertical Voting [335], Batch Ensemble [323], and Multi-input Multi-output [130] can derive uncertainty estimates.

2.5 Interpretability

Modern AI systems are known for their inability to provide faithful and human-understandable explanations for its own decisions. The unique characteristics of deep learning, such as network over-parameterization, large amount of training data, and stochastic optimization, while being beneficial to the predictive accuracy (e.g., References [27, 179, 274, 281]), all create obstacles toward understanding how and why a neural network reaches its decisions. The lack of human-understandable explanations leads to difficulties in the verification and trust of network decisions [52, 366].
We categorize model interpretation techniques into a few major classes, including visualization, feature attribution, instance attribution, inherently explainable models, and approximation by simple models. Visualization techniques present holistically what the model has learned from the training data by visualizing the model weights for direct visual inspection [34, 93, 97, 202, 209, 290]. In comparison, feature attribution and instance attribution are often considered as local explanations, as they aim to explain model predictions on individual samples. Feature attribution methods [22, 58, 60, 207, 230, 255, 265, 273, 287, 340] generate a saliency map of an image or video frame, which highlights the pixels that contribute the most to its prediction. Instance attribution methods [32, 46, 67, 156, 229, 265, 341] attribute a network decision to training instances that, through the training process, exert positive or negative influence on the particular decision. Moreover, inherently explainable models [33, 59, 175, 259, 345] incorporate explainable components into the network architecture, which reduces the need to apply post hoc interpretation techniques. In contrast, researchers also try to post hoc approximate complex neural networks with simple models such as rule-based models [84, 102, 115, 153, 222, 318] or linear models [14, 105, 106, 160, 241] that are easily understandable.
The most significant benefit of interpretation in the context of CEA lies in its ability to aid with the auditing and debugging of AI systems and datasets. With feature attribution, users can make sure the system captures the robust features, or semantically meaningful features, that generalize to real-world data. As in the well-known case of husky vs. wolf image classification, due to a spurious correlation, the neural network learns to classify all images with white backgrounds as wolf and those with green backgrounds as husky [206]. Such shortcut learning can be identified by feature attribution and subsequently corrected. Moreover, instance attribution allows researchers to pinpoint outliers or incorrectly labeled training data that may lead to misclassification [67].

3 Controlled-environment Agriculture

CEA is the farming practice carried out within urban, indoor, resource-controlled factories, often accompanied by stacked growth levels (i.e., vertical farming), renewable energy, and recycling of water and waste. CEA has recently been adopted in nations around the world [38, 82], such as Singapore [161], North America [6], Japan [9, 264], and the UK [8].
CEA has economic and environmental benefits. Compared to traditional farming, CEA farms produce higher yield per unit area of land [8, 9]. Controlled environments shield the plants from seasonality and extreme weather, so plants can grow all year round given suitable lighting, temperature, and irrigation [38]. The growing conditions can as well be further optimized to boost growth and nutritional content [159, 304]. Rapid turnover increases farmers’ flexibility in plant choice to catch the trend of consumption [35]. Moreover, farms investment on pesticides, herbicides, and transportation can be cut down due to reduced contamination from the outside environment and proximity to urban consumers.
CEA farms, when designed properly, can become much more environmentally friendly and self-sustainable than traditional farming. With optimized growing conditions and limited external interference, the need for fertilizer and pesticides decreases, so we can reduce the amount of chemicals that go into the environment as well as the resulting pollution. Furthermore, CEA farms can save water and energy through the use of renewable energy and aggressive water recycling. For instance, CEA farms from Spread, a Japanese company, recycle 98% of used water and reduce the energy cost per head of lettuce by 30% with LED lighting [9]. Finally, CEA farm can be situated in urban or suburban areas, thereby reducing transportation and storage cost. A simulation for different farm designs in Lisbon shows vertical tomato farms with appropriate designs emit less greenhouse gas than conventional farms, mainly due to reduced water use and transportation distance [37].
A significant drawback of CEA, however, lies in its high cost, which may be partially addressed by computer vision technologies. According to Reference [38], the higher land cost in Victoria, Australia, means that the yield of vertical farms has to be at least 50 times more than traditional farming to break even. Computer vision holds the promise of boosting the level of automation and increasing yield, thereby making CEA farms economically viable. As would be discussed in the following sections, CV techniques can reduce a major amount of variable costs, such as wastage cost induced by incorrect or delayed decisions on harvesting, and provide long-term benefit.
Carrying the potential to reduce a significant amount of cost, setting up computer vision systems in the field costs significantly less than expected when compared to the expenses of constructing a CEA building. Building a CEA structure involves high upfront costs, including construction, insulation, lighting, and HVAC systems. According to Reference [264], a 1,300-square-meter CEA building with a production area of 4,536 square meters would require a capital investment of $7.4 million and incur annual operational costs of approximately $3.4 million.
However, setting up hardware systems for CV models is relatively inexpensive. The necessary components include servers (CPU, GPU, memory, storage), sensors, cameras, networking, as well as cooling system. For example, a server with specifications such as a 32-Core 2.80 GHz Intel Xeon Platinum 8462Y+, 128 G memory, 4 NVIDIA RTX A6000 “Ada” GPUs, and 2 TB storage costs around $60,000. Using this server for training purposes, assuming a standard VGG-16 architecture, training on 5,000 images of size 224 \(\times\) 224 pixels, with a batch size of 64 and 50 training epochs, and utilizing 4 NVIDIA A6000 GPUs, the estimated training time is less than an hour. Such a server is sufficient for daily training and inference of commonly used CV models. For a camera system, if we consider 10 surveillance cameras such as the Hikvision DS-2CD2142FWD-I, then the total cost would be around $1,400. Additionally, a high-speed network infrastructure is required to transfer data between the computer hardware, storage, and camera systems. Typically, it necessitates 4 to 7 routers to cover an area of 1,300 square meters, costing approximately $2,000. Finally, a liquid cooling system could cost between $1,000 and $2,000. In summary, a hardware system with a total cost of around $70,000 is sufficient for the daily operation, training, and inference of CV systems.
CEA can take diverse form factors [35], and the form factors may pose different requirements for computer vision technologies. Typical forms for CEA are glasshouses with transparent shells or completely enclosed facilities. Depending on the cultivars being planted, internal arrangement of the farm can be classified into stacked horizontal systems, vertical growth surfaces, and multi-floor towers. Form factors have influence on lighting, which is an important consideration in CV applications. For example, glasshouses with transparent shells utilize natural light to reduce energy consumption but may not provide sufficient lighting for CV around the clock. In comparison, a completely enclosed facility can have greater control of lighting conditions. Moreover, internal arrangement of the farm also affects camera angle. If cultivars being planted change frequently as a result of the high turnover rate in CEAs, then the arrangement of shelves and plants might change. This would affect the camera angles and thus the resulting inference performance. CV systems need adapt to the change of the environment.
Nevertheless, with the autonomous setup of CEAs, which allow easy new data collection, training a new CV model or fine-tuning a previous model to adapt to the above-mentioned changeable environment would be a cinch. Besides, there are also few-shot learning [286, 319], weakly-supervised learning[16, 218, 373], and unsupervised learning techniques [49, 253], which require minimal or zero annotations, that can facilitate the adjustment of the models.
Besides environmental change, there also exist other factors that need to be taken into account when applying CV techniques in CEA. Two typical problems to consider would be (1) How to cope with sub-optimal data with label noise and how to address unbalanced class distribution. (2) How to interpret the prediction from models or measure the uncertainty of prediction so users can use the models with confidence. Quantitative measure of the confidence or uncertainty would allow farmers to understand the decision generation process and make decisions with more confidence. Table 1 maps the above factors - environmental change, sub-optimal data quality, human factor - to CV problems and lists corresponding solutions and the respective sections that discuss the solutions.
In the following, we investigate the application of autonomous computer vision techniques on Growth Monitoring, Fruit and Flower Detection, Fruit Counting, Maturity Level Classification, and Pest and Disease Detection to increase production efficiency. In addition to existing applications, we include techniques that can be easily applied to vertical farms even though they have not yet been applied to them.

3.1 Growth Monitoring

Growth monitoring, a critical component of plant phenotyping, aims to understand the life cycle of plants and estimate yield by monitoring various growth indicators such as the plant size, number of leaves, leaf sizes, land area covered by the plant, and so on. Plant growth monitoring facilitates in quantifying the effects of biological/environmental factors on growth and thus is crucial for finding the optimal growing condition and developing high-yield crops [212, 294].
As early as 1903, Wilhelm Pfeffer recognized the potential of image analysis in monitoring plant growth [225, 279]. Traditional machine vision techniques such as gray-level pixel thresholding [220], Bayesian statistics [45], and shallow learning techniques [147, 348] have been applied to segment the objects of interest, such as leaves and stems, from the background to analyze plant growth. Compared to traditional methods, deep-learning techniques provide automatic representation learning and are less sensitive to image quality variations. For this reason, deep learning techniques for growth monitoring have recently gained popularity.
Among various growth indicators, leaf size and number of leaves per plant are the most commonly used [121, 169, 252]. Therefore, in the section below, we first discuss leaf instance segmentation, which can support both indicators at the same time, followed by a discussion of techniques for only leaf counting or for other growth indicators.

3.1.1 Leaf Instance Segmentation.

Due to the popularity of the CVPPP dataset [204], the segmentation of leaf instance has attracted special attention from the computer vision community and warrants its own section. leaf instance segmentation methods include recurrent network methods [238, 245] and pixel embedding methods [62, 80, 221, 324, 331]. Parallel proposal methods are popular for general-purpose segmentation (see Section 2.3), but are ill-suited for leaf segmentation. As most leaves have irregular shapes, the rectangle proposal boxes used in these methods do not fit the leaves well, resulting in many poorly positioned boxes. In addition, the density of leaves causes many proposal boxes to overlap and compounds the fitting problem. As a result, it is difficult to pick out the best proposal box from the large number of parallel proposals. Therefore, we focus on recurrent network-based methods and pixel embedding-based methods in this section. Quality metrics for leaf segmentation include Symmetric Best Dice (SBD) and Absolute Difference in Count (|DiC|). SBD calculates the average overlap between the predicted mask and the ground truth for all leaves. DiC calculates the average number of miscalculated leaves over the entire test set.
Recurrent network-based methods output a mask for a single leaf sequentially. Their decisions are usually informed by the already segmented parts of the image, which are summarized by the recurrent network. Reference [238] applies LSTM and DeconvNet to segment one leaf at a time. The network first locates a bounding box for the next leaf and performs segmentation within that box. After that, leaves segmented in all previous iterations are aggregated by the recurrent network and passed to the next iteration as contextual information. Reference [245] employs convolution-based LSTMs (ConvLSTM) with FCN feature maps as input. At each time step, the network outputs a single-leaf mask and a confidence score. During inference, the segmentation stops when the confidence score drops below 0.5. Reference [251] proposes another similar method that combines feature maps with different abstraction levels for prediction.
Pixel embedding methods learn vector representations for the pixels so pixels in irregularly shaped leaves can become regularly shaped clusters in the representation space. With that, we can directly cluster the pixels. Reference [324] performs simultaneous instance segmentation of leaves and plants. The authors propose an encoder-decoder framework, based on ERFNet [244], with two decoders. One decoder predicts the centroids of plants and leaves. The other decoder predicts the offset of each leaf pixel to the leaf centroid. The pixel location plus the offset vector hence should be very close to the leaf centroid. The dispersion among all pixels of the same leaf can be modeled as a Gaussian distribution, whose covariance matrix is also predicted by the second decoder and whose mean is from the first decoder. The training maximizes the Gaussian likelihood for all pixels of the same leaf. The same process is applied to pixels of the same plant.
References [62, 221, 331] are three similar pixel embedding methods. They encourage pixels from the same leaf to have similar embeddings and pixels from different neighboring leaves to have different embeddings to enable clustering in the embedding space. Their network consists of two modules, the distance regression module and pixel embedding module. References [221, 331] arrange the two modules in sequence, while Reference [62] places them in parallel. The distance regression module predicts the distance between the pixel and the closest object boundary. The pixel embedding module generates an embedding vector for each pixel, so pixels from the same leaves have similar embeddings and pixels from different neighboring leaves have different embeddings. During inference, pixels are clustered around leaf centers, which are identified as local maxima in the distance map from the distance regression module.
Last, References [80, 327] take a large-margin approach. They ensure that embeddings of pixels from the same leaf are within a circular margin of the leaf center, and the embedding of leaf centers are far away from each other. This removes the need to determine the leaf centroids during inference because the embeddings are already well separated. Reference [327] built upon the method in Reference [80] to perform pixel embedding and clustering of leaves under weak supervision, with annotation on only a subset of instances in the images. In addition, a differentiable instance-level loss for a single leaf is formed to overcome the non-differentiability of assigning pixels to instances by comparing a Gaussian shape soft mask with the corresponding ground truth mask. Finally, consistency regularization, which encourages accordance of two embedding frameworks, is applied to improve embedding for unlabeled pixels.
Comparing different approaches, proposal-free pixel embedding techniques seem to be the best choice for the leaf segmentation problem. As can be seen from Table 2, pixel embedding methods obtain both the highest SBD and lowest |DiC|. One thing to note here, however, is that superior results of W-Net [331] and SPOCO [327] could be attributed to the inclusion of ground-truth foreground masks during inference. Even though the recurrent approach does not generate a large number of proposal boxes at once, it still uses rectangular proposals, which means that it still suffers from the fitting problem to irregular leaf shapes. Moreover, the recurrent methods are usually slower than pixel embeddings, due to the temporal dependence between the leaves.

3.1.2 Leaf Count and Other Growth Metrics.

Leaf counts may be estimated without leaf segmentation. Reference [305] utilizes synthetic data in the leaf counting task. The authors employ the L-system-based plant simulator lpfg [3, 228] to generate Arabidopsis rosette images. The authors test a CNN, trained with only synthetic data, on real data from CVPPP and obtain superior result than a model trained with CVPPP data only. In addition, CNN trained with the combination of synthetic and real data obtained approximately 27% reduction in the mean absolute count error compared to CNN using only real data. These results demonstrate the potential of synthetic data in plant phenotyping.
Besides leaf size and leaf count, leaf fresh weight, leaf dry weight, and plant coverage (the area of land covered by the plant) are also used as metrics of growth. Reference [359] applies CNN to regress leaf fresh weight, leaf dry weight, and leaf area of lettuce on RGB images. Reference [240] makes use of Mask R-CNN, a parallel proposal method, for lettuce instance segmentation. The authors derive plant attributes such as contour, side view area, height, and width from the segmentation masks and bounding boxes, using preset formulas. They also estimate growth rate from the changes in area of the plant at each time step; they estimate fresh weight by linearly regressing from the attributes. Reference [198] leverages COCO dataset pretrained Mask R-CNN with ResNet-50 as backbone to segment lettuce leaves. The daily change of mean leaf area is used for growth rate calculation.

3.2 Fruit and Flower Detection

Algorithms for fruit and flower detection find the location and spatial distribution of fruits and fruit flowers. This task supports various downstream applications such as fruit count estimation, size estimation, weight estimation, robotic pruning, robotic harvesting, and disease detection [31, 108, 199, 342]. In addition, fruit or flower detection may help devise plantation management strategies [108, 127], because fruit or flower statistics such as positions, facing directions (the directions the flowers face), and spatial scatter can reveal the status of the plant and the suitability of environmental conditions. For example, the knowledge of flower distribution may allow pruning strategies that focus on regions of excessive density and achieve even distribution of fruits, which optimizes the delivery of nutrients to the fruits.
Traditional approaches for fruit detection rely on manual feature engineering and feature fusion. As fruits tend to have unique colors and shapes, one natural thought is to apply thresholding on color [219, 322] and shape information [192, 217]. Additionally, References [55, 184, 208] employ a combination of color, shape, and texture features. However, manual feature extraction suffers from brittleness when the image distribution changes with different camera resolutions, camera angles, illumination, and species [30].
Deep learning methods for fruit detection include object detection and segmentation. Reference [351] applies SSD for cherry tomato detection. Reference [139] leverages Faster R-CNN to detect tomatoes. Inside the generated bounding boxes, color thresholding and fuzzy-rule-based morphological processing methods are applied to remove image background and obtain the contours of individual tomatoes. Reference [249] leverages Faster R-CNN with VGG-16 as the backbone for sweet pepper detection. RGB and near-infrared (NIR) images are used together for detection. Two fusion approaches, early and late fusion, are proposed. Early fusion alters the first pretrained layer to allow four input channels (RGB and NIR), whereas late fusion aggregates the two modalities by training independent proposal models for each modality and then combining the proposed boxes by averaging the predicted class probabilities. Reference [356] trains three multi-task cascaded convolutional networks (MTCNN) [355] for detecting apples, strawberries, and oranges. MTCNN contains a proposal network, a bounding box refinement network, and an output network in a feature pyramid architecture with gradually increased input sizes for each network. The model is trained on synthetic images, which are random combinations of cropped negative patches and fruits patches, in addition to real-world images. Reference [346] proposed R-YOLO with MobileNet-V1 as the backbone to detect ripe strawberries. Different from regular horizontal bounding boxes in object detection, the model generates rotated bounding boxes by adding a rotation-angle parameter to the anchors.
Delicate fruits, such as strawberries and tomatoes, are particularly vulnerable to damage during harvesting. Therefore, much research has been devoted to segmenting such fruits from backgrounds to determine the precise picking point. Precise fruit masks are expected to enable robotic fruit picking while avoiding damages on the neighboring fruits. Reference [185] performs semantic segmentation for guava fruits and determines their poses using FCN with RGB-D images as input. The FCN outputs a binary mask for fruits and another binary mask for branches. With the fruit binary mask, the authors employ Euclidean clustering [248] to cluster single guava fruit. From the clustering result and the branch binary mask, fruit centroids and the closest branch are located. Finally, the system predicts the vertical axis of the fruit as the direction perpendicular to the closest branch to facilitate robotic harvesting. Similarly, Reference [13] leverages Mask R-CNN with ResNet as backbone for semantic segmentation of tomatoes. In addition, the authors filter the false positive detection of tomatoes from the non-targeted rows by setting a depth threshold. Reference [107] utilizes Mask R-CNN with a ResNet101 backbone to perform instance segmentation of ripe strawberries, raw strawberries, straps, and tables. Depth images are aligned with the segmentation mask to project the shape of strawberries into 3D space to facilitate automatic harvesting. Reference [347] also applies Mask R-CNN with a ResNet101 + FPN backbone to perform instance segmentation and ripeness classification on strawberries. Reference [141] leverages a similar network for instance segmentation of tomatoes. With the segmentation mask, the systems determine the cut points of the fruits.
Besides accuracy, the processing speed of neural networks is also important for their deployment on mobile devices or agricultural robots. Reference [262] performs network pruning on YOLOv3-tiny to form a lightweight mango detection network. A YOLOv3-tiny pretrained on the COCO dataset has learned to extract fruit-relevant features, because the COCO dataset contains apple and orange images, but it also has learned irrelevant features. The authors thus use a generalized attribution method [266] to determine the contribution of each layer to fruit features extraction and remove convolution kernels responsible for detecting non-fruit classes. They find that the lower-level features are shared across all classes detection and pruning in the higher layers does not harm fruit detection performance. After pruning, the network achieves significantly lower float-point operations (FLOPs) at the same level of accuracy.
Object detection is also applied for flower detection. Reference [199] proposes a modified YOLOv4-Tiny with cascade fusion (CFNet) to detect citrus buds, citrus flowers, and gray mold, which is a disease commonly found on citrus plants. The authors propose additionally a block module with channel shuffle and depth separable convolution for YOLOv4-Tiny. Reference [284] shrinks the anchor boxes of Faster-RCNN to fit small fruits and applies soft non-maximum suppression to retain boxes that may contain occluded objects. As flowers usually have similar morphological characteristics, flowers from other non-targeted species could possibly be used as training data in a transfer learning scenario. In Reference [285], the authors fine-tune a DeepLab-ResNet model [63] for fruit flower detection. The model is trained on apple flower dataset but achieves high F1 scores on pear and peach flower images (0.777 and 0.854, respectively).

3.3 Fruit Counting

Pre-harvest estimation of yields plays an important role in the planning of harvesting resources and marketing strategies [135, 338]. As fruits are usually sold to consumers as a pack of uniformly sized fruits or individual fruits, the fruit count also provides an effective yield metric [157], besides the distribution of fruit sizes. Traditional yield estimation is obtained through manual counting of samples from a few randomly selected areas [135]. Nonetheless, when the production is large-scale, to counteract the effect of plant variability, accurate estimation would require a large quantity of samples from different areas of the field, resulting in high cost. Thus, researchers resort to CV-based counting methods.
A direct counting method is to regress on the image and output the fruit count. In Reference [234], the authors apply a modified version of Inception-ResNet for direct tomato counting. The authors train the model on simulated images and test on real images, which suggests, once again, the viability of using simulated images to circumvent the cost for formulating a large dataset.
Besides direct regression, object detection [157, 320], semantic segmentation [154], and instance segmentation [215] have also been used for fruit counting. These methods provide an intermediate level of results from which the count can be easily gathered. Reference [157] proposes MangoYOLO based on YOLOv2-tiny and YOLOv3 for mango detection and counting. The authors increase the resolution of the feature map to facilitate detection of small fruits. Reference [124] proposes pre-trained Faster R-CNN network, building upon DeepFruits [249], to estimate the quantity of sweet pepper. The authors design a tracking sub-system for sweet pepper counting. The sub-system identifies new fruits by measuring the IoU between and comparing the boundary of detected and new fruits. Reference [154] performs semantic segmentation for mango counting using a modification of FCN. The coordinates of blob-like regions in the semantic segmentation mask are used to generate bounding boxes corresponding to mango fruits. Finally, Reference [215] applies Mask R-CNN to for instance segmentation of blueberries. The model also classifies the maturity of individual blueberries and counts the number of berries according to the masks.
Occlusion poses a difficult challenge for counting. Due to this issue, automatic count from detection or segmentation results is almost always lower than the actual number of fruits. To solve this, [157] calculates and applies the ratio between the actual hand harvest count and the automatic fruit count; it also uses both front and back views of mango trees to mitigate occlusion from one angle. Taking this idea one step further, Reference [320] uses dual-view videos to detect and track mangoes when the camera moves. Utilizing different views of the same tree in a video, Reference [320] recognizes around 20% more fruits. However, the detected count is still significantly lower than the actual number, underscoring the research challenge of exhaustive and accurate counting.

3.4 Maturity-level Classification

Maturity-level classification aims to determine the ripeness of fruits or vegetables to aid in proper harvesting and food quality assurance. Premature harvesting results in plants that are unpalatable or incapable of ripening, while delayed harvesting can result in overripe plants or food decay [141].
The optimal maturity level differs for different targeted products and destinations. Fruits and vegetables can be consumed at different growing stages. For example, lettuce can be consumed either as baby lettuce or fully grown lettuce. The same situation happens with baby corn and normal corn. Products are to be transported to different destinations, so we must consider the length of transportation and ripening speed when deciding the correct maturity level at harvest [358].
Manually distinguishing the subtle differences in maturity levels is time-consuming, prone to inconsistency, and costly. The labor cost of harvesting accounts for a large percentage of operation cost in farms, with 42% of variable production expenses in U.S. fruit and vegetable farms being spent on labor for harvesting [142]. Automatic maturity-level classification with computer vision, in contrast, can assist automatic harvesting [20, 107, 358] and reduce cost.
Similar to fruit detection, we can apply thresholding methods on color to detect ripeness. For example, Reference [25] applies color thresholding on HSI and YIQ color spaces. Reference [296] applies linear color models. Reference [176] utilizes the combination of color and texture features. References [96, 165, 256, 257, 329] apply shallow learning methods based on a multitude of features.
More recently, researchers evaluate the performance of deep learning-based computer vision methods on maturity level classification and attain satisfactory results. For example, Reference [357] applies CNN to classify tomato maturity into five levels. However, to further facilitate automatic harvesting, object detection and instance segmentation are more commonly used for getting the exact shape, location and maturity level of fruits, and position of peduncles for robotic end-effectors to cut on.
With object detection, Reference [346] applies the R-YOLO network described in the fruit detection section (Section 3.2) to detect ripe strawberries. Reference [124], as mentioned in the fruit counting section (Section 3.3), proposes pre-trained Faster R-CNN network to estimate both the ripeness and quantity of sweet pepper. Two formulations of the model are tested. One treats ripe/unripe as additional classes on top of foreground/background, and the other performs foreground/background classification first and then performs ripeness classification on foreground regions. The second approach generates better ripeness classification results, as the ripe/unripe classes are more balanced when only the foreground regions are considered.
Using the segmentation methods discussed in Section 3.2, Reference [13] classifies semantic segmentation masks of tomatoes into raw and ripe tomatoes. References [107, 347] perform instance segmentation and classify instance masks into ripe and raw strawberries. Reference [141] performs instance segmentation on tomatoes first. After transforming the mask region into HSV color space, the authors employ a fuzzy system to classify tomatoes into four classes: immature (completely green), breaker (green to tannish), preharvest (light red), and harvest (fully colored).

3.5 Pest and Disease Detection

Plants are susceptible to environmental disorders caused by temperature, humidity, nutritional excess/deficiency, light changes, and biotic disorders due to fungi, bacteria, virus or other pests [103, 272]. Infectious diseases or pest pandemic induce inferior plant quality or plant death, resulting in at least 10% of global food production losses [282].
Although controlled vertical farming restricts the entry of pests and diseases, it cannot eliminate them. Pests and diseases can enter the farm from accidental contamination from employees, seeds, irrigation water and nutrient solution, poorly maintained environment or phytosanitation protocols, unsealed entrance, and ventilation systems [242]. For this reason, pest and disease detection is still worth studying in the context of CEA.
Manual diagnosis of plant is complex due to the large quantity of vertically arranged plants in the field and numerous possible symptoms of diseases on different species. In addition, plants show different patterns along infection cycles, and their symptoms can vary in different parts of the plant [43]. Consequently, autonomous computer vision systems that recognize diseases according to the species and plant organs are gaining traction. From a technological perspective, we sort existing techniques into three parts, single- and multi-label classification, handling unbalanced class distributions, as well as label noise and uncertainty estimates.

3.5.1 Single- and Multi-label Classification.

Studies perform single-label, or one-label-per-image, classification of diseases of either one single species [24, 254, 272, 361] or multiple species [95]. Reference [361] creates a lightweight version of AlexNet, replacing the fully connected network with a global pooling layer, to classify six types of cucumber diseases. Reference [272] leverages CNNs for classifying leaves into mango leaves, diseased mango leaves, and other plant leaves. Reference [24] utilizes AlexNet and VGG16 to recognize five types of pests and diseases of tomatoes. Reference [95] applies AlexNet, AlexNetOWTBn [162], GoogLeNet, Overfeat [258], and VGG for classifying 25 different healthy or diseased plants.
Having a single label per image can be inaccurate. In the real world, one plant or one leaf can carry multiple diseases or contain multiple diseased regions. By detecting multiple targeted areas or disease classes, the multi-label setting can lead to improved efficiency and accuracy.
To deal with the possibility of having multiple diseases or multiple areas of diseases on one plant simultaneously, two types of methods are proposed. Reference [201] first segments out different infection areas on cucumber leaves using color thresholding following Reference [200], then applies DCNN on segmented areas to classify four types of cucumber diseases. Nevertheless, the color thresholding technique may not generalize to other plant species and environment. Another type of method leverages object detection or segmentation for locating and classifying infection areas. Reference [254] locates multiple diseased regions of banana plants simultaneously using object detection but assigns only one disease label to each image. Reference [103] compared Faster R-CNN, R-FCN, and SSD for detecting nine classes of diseases and pests that affect tomato plants. Multiple diseases and pests in one plant are detected simultaneously. Reference [349] applies improved DeepLab v3+ for segmentation of multiple black rot spots on grape leaves. The efficient channel attention mechanism [315] is added to the backbone of DeepLab v3+ for capturing local cross-channel interaction. Feature pyramid network and Atrous Spatial Pyramid Pooling [64] are utilized for fusing feature maps from the backbone network at different scales to improve segmentation.

3.5.2 Handling Unbalanced Class Distributions.

A common obstacle encountered in disease detection is unbalanced disease class distributions. There are typically much fewer diseased plants than healthy plants; the unequal frequencies introduce difficulties in finding images of rare diseases; the data unbalance leads to difficulty for model training. To remedy such problem, researchers propose weakly supervised learning [44], generative adversarial network (GAN) [116], and few-shot learning [182, 216].
Specifically, Reference [44] applies multiple instance learning (MIL), a type of weakly supervised learning method, for multi-class classification of six mite species of citrus. In MIL, the learner receives a set of labeled bags, containing multiple image instances. We know that at least one instance is associated with the class label but do not know the exact instance. The MIL algorithm tries to identify the common characteristic shared by images in the positively labeled bags. In this work, a CNN is first trained with labeled bags. Next, by calculating saliency maps of images in bags, the model identifies salient patches that have a high probability of containing mites. These patches inherit labels from their bags and are used to refine the CNN trained above.
Reference [116] leverages GAN to generate realistic image patches of tip-burn lettuce and trains U-net for tip-burn segmentation. For the generation stage, lettuce canopy image patches are inputted into Wasserstein GANs [26] to generate stressed (tip-burned) patches so there are an equal number of stressed and healthy patches. Then, in the segmentation stage, the authors generate a binary label map for the images using a classifier and an edge map. The binary label map labels each mini-patches (super-pixels) as stressed or healthy. The authors then feed the label map, alongside the original images, as input to U-net for mask segmentation.
In few-shot meta-learning, we are given a meta-train set and a meta-test set, with the two sets containing mutually exclusive image classes (i.e., classes in the training set do not appear in the testing set). Meta-train or meta-test sets contain a number of episodes, each of which consists of some training (supporting) images and some test (query) images. The rationale of meta-learning is to equip the model with the ability to quickly learn to classify the test images from a small number of training images within each episode. The model acquires this meta-learning capability on the meta-train set and is evaluated on the meta-test set.
As an example, Reference [216] performs pests and diseases classification with few-shot meta-learning. The model framework consists of an embedding module and a distance module. The embedding module first projects supporting images into an embedding space using ResNet-18, then feeds embedding vectors into a transformer to incorporate information of other support samples in the same episode. After that, the distance module calculates the Mahalanobis distance [104] of the query and support samples to classify the query. Similarly, Reference [182] uses a shallow CNN for embedding and the Euclidean distance for calculating the similarity between the embeddings of the query and support samples.

3.5.3 Label Noise and Uncertainty Estimates.

Reference [263] is another example of meta-learning, but it is used to improve the network’s robustness against label noise. The model consists of two phrases. The first phrase is the conventional training of a CNN for classification. In the second phrase, the authors generate 10 synthetic mini batches of images, containing real images with the labels taken from similar images. As a result, these mini-batches could contain noisy labels. After one step update on the synthetic instances, the network is trained to output similar predictions with the CNN from the first phrase. The result is a model that is not easily affected by noisy training data.
Finally, having a confidence score associated with the model prediction allows farmers to make decisions selectively under different confidence levels and boost the acceptance of deep learning models in agriculture. As an example, Reference [99] performs classification of tomato diseases and pairs the prediction with a confidence score following Reference [79]. The confidence score, calculated using Bayes’ rule, is defined as the probability of the true class label conditioned on the class probability predicted by the CNN. In addition, the authors build an ontology of disease classification. For example, the parent node “stressed plant” has as children “bacteria infection” and “virus infection,” which in turn has “mosaic virus” as a child. If the confidence score of a specific terminal disease label is below a certain threshold, then the model switches to its more general parent label in the tree for higher confidence. By the axiom of probability, the predicted probability of the parent label is the summation of all the predicted probability of its direct descendants. For a general discussion of machine learning techniques that create well-calibrated uncertainty estimates, we refer readers to Section 2.4.

4 Datasets

High-quality datasets with human annotations are one of the most important factors in the success of a machine learning project [205, 325]. In this section, we review established datasets that enable training of CV models. We exclude datasets of plants for which we have not found literature regarding their suitability in CEA, such as apples [41, 126], broccoli [166], and dates [21]. We have manually checked every dataset listed and assure that they are available for downloading at the time of writing. By summarizing the dataset related to CEA, we aim to facilitate interested researchers on their future studies. In the meantime, we would like to encourage scholars to publish more datasets dedicated to CEA.
As listed in Tables 7 and 8, we discover 14 datasets in CEA, with 3 for Growth Monitoring, 5 for Fruit Detection, and 6 for Pest and Disease Detection. Each targeted task contains at least one dataset that covers multiple species to facilitate training of generalizable and transferable models. The largest dataset is CVPPP with 6,287 and 165,120 RGB images for Arabidopsis and Tobacco, respectively, aiming for growth monitoring-related tasks. All the available datasets are composed of real images. While real images provide realistic data, we also want to encourage publication of synthetic datasets, which usually feature balanced class distribution and accurate labeling. Another point noteworthy is that many real images are collected under simplified laboratory environments, which may bias the data toward specific lighting conditions, backgrounds, plant orientation, or camera positions. For real-world application, practitioners may need to further fine-tune the trained models on more realistic data.
Table 7.
Target TaskDatasetRelease YearData DescriptionURL
Growth MonitoringCVPPP dataset [204]20146,287 and 165,120 RGB images (resolution 72 \(\times\) 72) of Arabidopsis and Tobacco, respectively. Annotations include bounding boxes and segmentation masks for every plant and every leaf, and the leaf centers.https://www.plant-phenotyping.org/datasets-download
Oil Radish dataset [164]2019129 RGB images (resolution 1 \(\times\) 1) of oil radish with binary semantic segmentation mask and respective plant fresh and dry weight, as well as nutrient content.https://competitions.codalab.org/competitions/20981#learn_the_details
Leaf Counting dataset [295]20189,372 RGB images (resolution 72 \(\times\) 72) of weeds with the number of leaves counted.https://vision.eng.au.dk/leaf-counting-dataset/
Fruit and Flower DetectionDeepFruits [249]2016RGB images (resolution 72 \(\times\) 72 to 400 \(\times\) 400) of sweet pepper, rock melon, apple, mango, orange, and strawberry images annotated with rectangular bounding boxes. Each fruit has 42–170 images.https://drive.google.com/drive/folders/1CmsZb1caggLRN7ANfika8WuPiywo4mBb
Orchard Fruit [30]20161,120, 1,964, and 620 RGB images (resolution 72 \(\times\) 72) of apple, mango, and almond, respectively. Apples are annotated with bounding circles; mango and almond are annotated with rectangular bounding boxes.http://data.acfr.usyd.edu.au/ag/treecrops/2016-multifruit/
MangoYOLO [157]20191,730 RGB images of mango (resolution 72 \(\times\) 72 and 300 \(\times\) 300), annotated with rectangular bounding boxes; photos are under artificial lighting.https://figshare.com/articles/dataset/MangoYOLO_data_set/13450661
MangoNet Semantic Dataset[154]201945 training images and 4 test images (resolution 180 \(\times\) 180) of mango. Each image is annotated with semantic segmentation mask that is colored green in regions of mangoes and black in non-mango regions.https://github.com/avadesh02/MangoNet-Semantic-Dataset
 Fruit Flower detection [86]2018162, 20, and 15 images (resolution 72 \(\times\) 72) of apple, peach, and pear flowers annotated with binary semantic segmentation mask with white representing flower pixels.https://data.nal.usda.gov/dataset/data-multi-species-fruit-flower-detection-using-refined-semantic-segmentation-network
Table 7. Dataset for CV tasks in CEA
Table 8.
Target TaskDatasetRelease YearData DescriptionURL
Pest and Disease DetectionPlant Village [143]201961,486 RGB images (resolution 72 \(\times\) 72) of plant leaves, with 39 different classes of diseased and healthy plant leaves.https://data.mendeley.com/datasets/tywbtsjrjv/1
Crop Pests Recognition [181]20205,629 RGB images (resolution 72 \(\times\) 72) of 10 pest classes, each class containing over 400 images.https://bit.ly/2DdUFza
Plant and Pest [182]20216,000 RGB images (resolution 72 \(\times\) 72) of 20 different classes of plant leaves and pests from Plant Village [143] and Crop Pests Recognition [181].https://zenodo.org/record/4529076#.YupE_-xBzlw
 Citrus Pest Benchmark [44]202210,816 multi-class RGB images (resolution 1,200 \(\times\) 1,200) categorized into seven classes of pests.https://github.com/edsonbollis/Citrus-Pest-Benchmark
 IP102 [330]201975,000 images (resolution 400 \(\times\) 300) of 102 insect classes and among these 19,000 are annotated with bounding boxes.https://github.com/xpwu95/IP102
 Plant Leaves [267]20224,503 images (resolution 6,000 \(\times\) 4,000), which includes 2,278 images of healthy leaves and 2,225 images of the diseased leaves.https://data.mendeley.com/datasets/hb74ynkjcn/1
Table 8. Dataset for CV Tasks in CEA

5 Future Research Directions

So far, we have discussed the objectives, benefits, and realizations of Growth Monitoring, Fruit and Flower Detection, Fruit Counting, Maturity Level Classification, and Pest and Disease Detection in CEA precision farming. Based on the current research status and existing technical capabilities of computer vision, we would like to point out several areas where computer vision technologies could provide short- to mid-term benefits to urban and suburban CEA. We identify three such areas, including realistic datasets that are unbalanced and noisy, uncertainty quantification, and multi-task learning/system integration.

5.1 Handling Realistic Data

The ability to handle realistic data is a critical competence that has not received sufficient research attention (with a few notable exceptions [44, 116, 182, 216, 263]). Unlike well-curated datasets that have accurate and abundant labels and relatively balanced label distributions, real-world data exhibit skewed label distribution as well as substantial noise in the labels. For effective real-world application, it is important that the CV algorithms can maintain good predictive performance under these conditions. In addition, the algorithmic tolerance of data imperfection can lower annotation cost and enable wider applications of CV. There has been substantial research on these topics in the computer vision community, such as long-tail recognition [81, 190, 261, 313, 369, 370], few-shot and zero-shot learning [177, 275, 276, 277, 334], as well as noise-resistant classification [17, 70, 150, 321, 368] and metric learning [145, 188, 310]. We believe that research on smart agriculture could benefit from the existing body of literature.

5.2 Quantifying Uncertainty and Interpretability

Real-world applications call for reliable estimation of the quality of automated decisions. An incorrect prediction made by an AI system may have profound implications. For example, if the system incorrectly determines that fruits are not mature enough, then it may delay harvesting and cause overripe fruits with diminished values. However, it is impossible to eliminate incorrect or uncertain predictions, as they originate from factors difficult to control and precisely measure, including model assumptions, test data shift, incomplete training data, and so on [11, 144]. Thus, we argue that uncertainty quantification is another crucial factor for real-world deployment. Such quantification would allow farmers to make informed decisions on whether to follow the machine recommendation or not. For the convenience of readers, we provide a brief review of such deep learning techniques in Section 2.4.
Besides uncertainty quantification, pairing the model with explanation on its decisions could enhance user confidence and assist auditing and debugging of the AI system. Specifically, instance attribution methods, as discussed in Section 2.5, enable detection of the biased or low-quality data points with extreme influence on prediction [67]. For example, if the model is trained with an image of dry leaves with dust that resembles a certain disease of the plant, then in the inference process, the model might misclassify diseased leaves as normal dry leaves or vice versa and induce plant death or unnecessary treatments. With instance attribution interpretation, researchers can identify misleading data points and perform adversarial training to improve model accuracy.

5.3 Multi-task Learning and System Integration

Real-world deployment usually requires the coordination of multiple CV capabilities provided by different networks. When the system is designed well, these networks could facilitate each other and achieve synergistic effects. For example, instance segmentation can be used for fruit and flower localization (Section 3.2), growth monitoring (Section 3.1), and fruit maturity level detection (Section 3.4). However, academic research tends to study these problems in isolation, thereby unable to reap benefits of multi-task learning.
Multi-task learning [29, 51, 191] focuses on leveraging mutually beneficial supervisory signals from multiple correlated tasks. Recently, CV researchers have built large-scale networks [66, 71, 120, 149, 152, 197, 314, 374] that perform a wide range of tasks and achieve state-of-the-art results on most tasks. This demonstrates the benefits of multi-task learning and could inspire similar work dedicated to smart farming in CEAs.
Another motivation for considering multi-task learning and system integration is that errors can propagate in a pipeline architecture. For example, a network could first incorrectly detect a leaf occluding a mature fruit as the fruit and then classify it as an immature fruit. As a result, simply concatenating multiple techniques will result in inferior overall performance than what practitioners may expect. Thus, we encourage system designers to consider end-to-end training or other innovative techniques [119, 332, 339] for aligning and interfacing different components within a system.
Finally, multi-task learning handles multiple tasks simultaneously, which saves computation power, enhances data efficiency, and alleviates the necessity to maintain and iterate multiple models. Such benefits are crucial for popularizing CEAs, as they facilitate the efficient use of energy, computation power, and human resources. Consequently, both the initial setup and ongoing maintenance investments for CEA farms can be reduced, expediting the emergence of economically viable CEAs. Furthermore, mindful selection and combination of targeted tasks have the potential to further improve overall efficiency [280].

5.4 Effective Use of Multimodality

Fusion of multi-modal data enhances inference ability of models by incorporating complementary view of data [168]. In the context of CEA, thermal or depth images capture the depth or temperature differences between foreground and background and enable filtering of non-target objects (e.g., fruits or leaves). Abnormal temperature changes during growth cycle can also indicate disease infection before visual symptoms appear [53, 54]. Furthermore, as different materials absorb, reflect, and transmit light in different ways and at different wavelengths, multi-spectral imaging (MSI) and hyper-spectral imaging (HSI), which capture images at multiple wavelengths of light, can be used to perform more specific internal inspection of leaves, fruits, and plants as compared to thermal and depth images. Finally, LiDAR and RGB-D systems allow the generation of high-density 3D point clouds of plants, fruits [107, 185], or environment [309], which facilitate 3D volume measurement or cut-point detection during harvesting.
Existing works have demonstrated the efficacy of MSI and HSI [13, 42, 311]. MSI has been utilized for yield prediction [301] and early disease detection [223, 307]. However, current literature explored majorly the power of MSI with shallow machine learning. We found only one work that leverages deep learning on MSI input [301], which applies a pruned VGG-16 for wheat yield estimation. HSI provides finer-grained resolution and divides the range of wavelength into many more spectral bands than MSI, typically ranging from tens to hundreds of bands, though at a higher cost. Hyper-spectral images have been used as the sole modality in early disease detection with both shallow machine learning methods [18, 19, 288] and deep learning methods [98, 122, 214, 311]. Due to relevancy and space limit, we will only talk about the deep learning methods here. Specifically, with a GAN-based data augmentation method, Reference [311] performs early detection of tomato spotted wilt virus before visible symptoms appear using hyper-spectral images. Reference [214] performs early detection of grapevine vein-clearing virus and shows the discriminative power of HSI in combination with CNN and shallow machine learning algorithms. Reference [98] attains early barley disease detection through generating future prediction of hyper-spectral barley leaf images using GAN. Moreover, HSI has also been utilized for yield prediction through fruit counting. Reference [122] leverages CNN and HSI to segment semantic mango masks and count the number of fruits.
However, systematic exploration of fusion techniques for multimodal inputs remains relatively rare in CEA applications. Many existing approaches adopt pipeline-based multimodal integration techniques that do not exhaust the potential of deep learning due to the lack of end-to-end training. For example, in Reference [13], the authors set a depth threshold to filter false positive tomato detection from the background. Reference [42] first performs broccoli segmentation on the RGB image. Within the segmentation mask, the authors find the mode of the depth value distribution, which is used to calculate the diameter of the broccoli head. Reference [185] conducts semantic segmentation for guava fruits using RGB images and reconstructs their 3D positions from the depth input. Reference [107] utilizes Mask R-CNN to perform instance segmentation of strawberries and align depth image with the segmentation mask to obtain 3D shape of strawberries. These methods use the two modalities separately and do not apply end-to-end training of the pipeline. As exceptions, Reference [249] proposes late fusion of RGB and near-infrared images in sweet pepper detection. Reference [308] incorporates depth information by replacing the blue channel with depth channel and applies masked R-CNN to locate tomatoes.
In computer vision research, numerous techniques for fusing and joint utilization of multimodal information have been proposed over the years, which we believe could contribute to CV applications in CEA. Due to space limits, we list only a few examples here. Reference [260] proposes two different ways to combine multiple modalities in object detection, Concatenation and Element-wise Cross Product. The former combines feature maps from different modalities along the channel dimension and lets the network discover the best way to combine them from data. The latter technique, Element-wise Cross Product, applies element-wise multiplication to every possible pair of feature maps from the two modalities. Reference [50] experiments with a variety of fusion techniques for RGB and optical flow and discovers a high-performing late-fusion strategy in action recognition. In self-supervised learning, Reference [125] identifies similar data points using one modality and treats them as positive pairs in another modality. This technique provides another paradigm to leverage the complementary nature of multimodality.

6 Conclusions

Smart agriculture, and particularly computer vision for controlled-environment agriculture (CV4CEA), are rapidly emerging as an interdisciplinary area of research that could potentially lead to enormous economic, environmental, and social benefits. In this survey, we first provide brief overviews of existing CV technologies that range from image recognition to structured understanding such as segmentation; from uncertain quantification to interpretable machine learning. Next, we systematically review existing applications of CV4CEA, including growth monitoring, fruit and flower detection, fruit counting, maturity-level classification, and pest/disease detection. Finally, we highlight a few research directions that could generate high-impact research in the near future.
Like any interdisciplinary area, research progress in CV4CEA requires expertise in both computer vision and agriculture. However, it could take a substantial amount of time for any researcher to acquire in-depth understanding of both subjects. By reviewing existing applications, available CV technologies, and identifying possible future research directions, we aim to provide a quick introduction of CV4CEA to researchers with expertise in agriculture or computer vision alone. It is our hope that the current survey will serve as a bridge between researchers from diverse backgrounds and contribute to accelerated innovation in the next decade.

References

[1]
Singapore Food Agency. 2023. 30 by 30. Retrieved 15 August 2023 from https://www.ourfoodfuture.gov.sg/30by30/
[2]
Jennifer Marston. 2021. AeroFarms partners with hortifrut to grow blueberries, caneberries via vertical farming. Retrieved 28 July 2022 from https://thespoon.tech/aerofarms-partners-with-hortifrut-to-grow-blueberries-caneberries-via-vertical-farming/
[3]
n.d. Algorithmic Botany. Retrieved 20 June 2022 from http://www.algorithmicbotany.org/virtual_laboratory/
[4]
2022. All in(doors) on citrus production. Retrieved 28 July 2022 from https://www.hortibiz.com/newsitem/news/all-indoors-on-citrus-production/
[5]
2022. Greenhouse in Shanghai successfully plants bananas on water. Retrieved 28 July 2022 from https://www.hortidaily.com/article/9369964/greenhouse-in-shanghai-successfully-plants-bananas-on-water/
[6]
n.d. Introducing VertiCrop™. Retrieved 24 May 2022 from https://verticrop.com/
[7]
2019. Mango trees cultivation under greenhouse conditions. Retrieved 28 July 2022 from https://horti-generation.com/mango-trees-cultivation-under-greenhouse-conditions/
[8]
n.d. Saturn Bioponics. Retrieved 25 May 2022 from http://www.saturnbioponics.com/
[9]
n.d. Spread-A new way to grow vegetable. Retrieved 24 May 2022 from https://spread.co.jp/en/environment/
[10]
n.d. Tomatoes and cucumbers in a vertical farm without daylight. Retrieved 28 July 2022 from https://www.hortidaily.com/article/9212847/tomatoes-and-cucumbers-in-a-vertical-farm-without-daylight/
[11]
Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U. Rajendra Acharya, Vladimir Makarenkov, and Saeid Nahavandi. 2021. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion 76, (2021), 243–297.
[12]
Edward H. Adelson. 2001. On seeing stuff: The perception of materials by humans and machines. In Human Vision and Electronic Imaging VI, Bernice E. Rogowitz and Thrasyvoulos N. Pappas (Eds.), Vol. 4299. International Society for Optics and Photonics, SPIE, 1–12. DOI:
[13]
Manya Afonso, Hubert Fonteijn, Felipe Schadeck Fiorentin, Dick Lensink, Marcel Mooij, Nanne Faber, Gerrit Polder, and Ron Wehrens. 2020. Tomato fruit detection and counting in greenhouses using deep learning. Front. Plant Sci. 11 (2020), 571299.
[14]
I. Ahern, A. Noack, L. Guzman-Nateras, D. Dou, B. Li, and J. Huan. 2019. NormLime: A new feature importance metric for explaining deep neural networks. arXiv preprint arXiv:1909.04200.
[15]
Latief Ahmad and Firasath Nabi. 2021. Agriculture 5.0: Artificial Intelligence, IoT and Machine Learning. CRC Press.
[16]
Jiwoon Ahn, Sunghyun Cho, and Suha Kwak. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2209–2218.
[17]
Görkem Algan and Ilkay Ulusoy. 2020. Meta soft label generation for noisy labels. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[18]
Ali AlSuwaidi, Bruce Grieve, and Hujun Yin. 2018. Combining spectral and texture features in hyperspectral image analysis for plant monitoring. Measur. Sci. Technol. 29, 10 (2018), 104001.
[19]
Ali AlSuwaidi, Bruce Grieve, and Hujun Yin. 2018. Feature-ensemble-based novelty detection for analyzing plant hyperspectral datasets. IEEE J. Select. Topics Appl. Earth Observ. Rem. Sens. 11, 4 (2018), 1041–1055.
[20]
H. Altaheri, M. Alsulaiman, M. Faisal, and G. Muhammed. 2019. Date fruit dataset for automated harvesting and visual yield estimation. In Proceedings of the IEEE DataPort Conference.
[21]
Hamdi Altaheri, Mansour Alsulaiman, Mohammed Faisal, and Ghulam Muhammed. 2019. Date Fruit Dataset for Automated Harvesting and Visual Yield Estimation. DOI:
[22]
Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. 2017. Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv preprint arXiv:1711.06104 (2017).
[23]
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).
[24]
Krishnaswamy R. Aravind, Purushothaman Raja, Rajendran Ashiwin, and Konnaiyar V. Mukesh. 2019. Disease classification in Solanum melongena using deep learning. Span. J. Agric. Res. 17, 3 (2019), e0204–e0204.
[25]
Arman Arefi, Asad Modarres Motlagh, Kaveh Mollazade, and Rahman Farrokhi Teimourlou. 2011. Recognition and localization of ripen tomato based on machine vision. Austral. J. Crop Sci. 5, 10 (2011), 1144–1149.
[26]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. PMLR, 214–223.
[27]
Sanjeev Arora, Nadav Cohen, and Elad Hazan. 2018. On the optimization of deep networks: Implicit acceleration by overparameterization. In Proceedings of the International Conference on Machine Learning. PMLR, 244–253.
[28]
S. Aruul Mozhi Varman, Arvind Ram Baskaran, S. Aravindh, and E. Prabhu. 2017. Deep learning and IoT for smart agriculture using WSN. In Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC’17). 1–6. DOI:
[29]
B. J. Bakker and T. M. Heskes. 2003. Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research 4 (2003) 83–99.
[30]
Suchet Bargoti and James Underwood. 2017. Deep fruit detection in orchards. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’17). IEEE, 3626–3633.
[31]
Suchet Bargoti and James P. Underwood. 2017. Image segmentation for fruit detection and yield estimation in apple orchards. J. Field Robot. 34, 6 (2017), 1039–1060.
[32]
Elnaz Barshan, Marc-Etienne Brunet, and Gintare Karolina Dziugaite. 2020. Relatif: Identifying explanatory training samples via relative influence. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 1899–1909.
[33]
Jasmijn Bastings, Wilker Aziz, and Ivan Titov. 2019. Interpretable neural predictions with differentiable binary variables. arXiv preprint arXiv:1905.08160 (2019).
[34]
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6541–6549.
[35]
Andrew M. Beacham, Laura H. Vickers, and James M. Monaghan. 2019. Vertical farming: A summary of approaches to growing skywards. J. Horticult. Sci. Biotechnol. 94, 3 (2019), 277–283.
[36]
Avital Bechar and Clément Vigneault. 2016. Agricultural robots for field operations: Concepts and components. Biosyst. Eng. 149 (2016), 94–111.
[37]
Khadija Benis, Christoph Reinhart, and Paulo Ferrão. 2017. Development of a simulation-based decision support workflow for the implementation of Building-Integrated Agriculture (BIA) in urban contexts. J. Clean. Product. 147 (2017), 589–602.
[38]
Kurt Benke and Bruce Tomkins. 2017. Future food-production systems: Vertical farming and controlled-environment agriculture. Sustain.: Sci., Pract. Polic. 13, 1 (2017), 13–26.
[39]
Daniel Berckmans. 2017. General introduction to precision livestock farming. Anim. Front. 7, 1 (2017), 6–11.
[40]
Anuja Bhargava and Atul Bansal. 2021. Fruits and vegetables quality evaluation using computer vision: A review. J. King Saud Univ.-Comput. Inf. Sci. 33, 3 (2021), 243–257.
[41]
Santosh Bhusal, Manoj Karkee, and Qin Zhang. 2019. Apple dataset benchmark from orchard environment in modern fruiting wall. Agricultural Automation and Robotics Lab. http://rightsstatements.org/vocab/InC/1.0/
[42]
Pieter M. Blok, Eldert J. van Henten, Frits K. van Evert, and Gert Kootstra. 2021. Image-based size estimation of broccoli heads under varying degrees of occlusion. Biosyst. Eng. 208 (2021), 213–233. DOI:
[43]
C. H. Bock, P. E. Parker, A. Z. Cook, and T. R. Gottwald. 2008. Visual rating and the use of image analysis for assessing different symptoms of citrus canker on grapefruit leaves. Plant Disease 92, 4 (2008), 530–541.
[44]
Edson Bollis, Helio Pedrini, and Sandra Avila. 2020. Weakly supervised learning guided by activation mapping applied to a novel citrus pest benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 70–71.
[45]
Charles A. Bouman and Michael Shapiro. 1994. A multiscale random field model for Bayesian image segmentation. IEEE Trans. Image Process. 3, 2 (1994), 162–177.
[46]
Jonathan Brophy and Daniel Lowd. 2020. TREX: Tree-Ensemble Representer-point Explanations. arXiv preprint arXiv:2009.05530 (2020).
[47]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models Are Few-Shot Learners. arXiv 2005.14165 (2020).
[48]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).
[49]
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 33 (2020), 9912–9924.
[50]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.
[51]
Rich Caruana. 1997. Multitask learning. Mach. Learn. 28, 1 (1997), 41–75.
[52]
Diogo V. Carvalho, Eduardo M. Pereira, and Jaime S. Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (2019), 832.
[53]
Laury Chaerle, Dik Hagenbeek, Erik De Bruyne, Roland Valcke, and Dominique Van Der Straeten. 2004. Thermal and chlorophyll-fluorescence imaging distinguish plant-pathogen interactions at an early stage. Plant Cell Physiol. 45, 7 (2004), 887–896.
[54]
Laury Chaerle, Ilkka Leinonen, Hamlyn G. Jones, and Dominique Van Der Straeten. 2007. Monitoring and screening plant populations with combined thermal and chlorophyll fluorescence imaging. J. Experim. Botan. 58, 4 (2007), 773–784.
[55]
Supawadee Chaivivatrakul, Jednipat Moonrinta, and Matthew N. Dailey. 2010. Towards automated crop yield estimation-detection and 3D reconstruction of pineapples in video sequences. In VISAPP 1, Citeseer, 180–183.
[56]
Akshay L. Chandra, Sai Vikas Desai, Wei Guo, and Vineeth N. Balasubramanian. 2020. Computer vision with deep learning for plant phenotyping in agriculture: A survey. arXiv preprint arXiv:2006.11391 (2020).
[57]
Fernando Alfredo Auat Cheein and Ricardo Carelli. 2013. Agricultural robotics: Unmanned robotic service units in agricultural tasks. IEEE Industr. Electron. Mag. 7, 3 (2013), 48–58.
[58]
Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer interpretability beyond attention visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 782–791.
[59]
Howard Chen, Jacqueline He, Karthik Narasimhan, and Danqi Chen. 2022. Can Rationalization Improve Robustness? arXiv preprint arXiv:2204.11790 (2022).
[60]
Hanjie Chen, Guangtao Zheng, and Yangfeng Ji. 2020. Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection. arXiv preprint arXiv:2004.02015 (2020).
[61]
Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Su, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4974–4983.
[62]
Long Chen, Martin Strauch, and Dorit Merhof. 2019. Instance segmentation of biomedical images with an object-aware embedding learned with local constraints. In Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention. DOI:
[63]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2017), 834–848.
[64]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’18). 801–818.
[65]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. arXiv 2002.05709 (2020).
[66]
Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, and Geoffrey Hinton. 2021. Pix2seq: ALanguage Modeling Framework for Object Detection. arXiv preprint arXiv:2109.10852 (2021).
[67]
Yuanyuan Chen, Boyang Li, Han Yu, Pengcheng Wu, and Chunyan Miao. 2021. Hydra: Hypergradient data relevance analysis for interpreting deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 7081–7089.
[68]
Yi-Ting Chen, Xiaokai Liu, and Ming-Hsuan Yang. 2015. Multi-instance object segmentation with occlusion handling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3470–3478. DOI:
[69]
Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, and Liang-Chieh Chen. 2020. Panoptic-DeepLab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 12472–12482. DOI:
[70]
De Cheng, Tongliang Liu, Yixiong Ning, Nannan Wang, Bo Han, Gang Niu, Xinbo Gao, and Masashi Sugiyama. 2022. Instance-dependent label-noise learning with manifold-regularized transition matrix estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16630–16639.
[71]
Jaemin Cho, Jie Lei, Hao Tan, and Mohit Bansal. 2021. Unifying vision-and-language tasks via text generation. In Proceedings of the International Conference on Machine Learning. PMLR, 1931–1942.
[72]
Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, and Chunhua Shen. 2021. Twins: Revisiting the Design of Spatial Attention in Vision Transformers. arXiv 2104.13840 (2021).
[73]
Dan Ciresan, Alessandro Giusti, Luca Gambardella, and Jürgen Schmidhuber. 2012. Deep neural networks segment neuronal membranes in electron microscopy images. In Advances in Neural Information Processing Systems, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2012/file/459a4ddcb586f24efd9395aa7662bc7c-Paper.pdf
[74]
Sergio Cubero, Won Suk Lee, Nuria Aleixos, Francisco Albert, and Jose Blasco. 2016. Automated systems based on machine vision for inspecting citrus fruits from the field to postharvest—A review. Food Bioprocess Technol. 9 (2016), 1623–1639.
[75]
Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2019. RandAugment: Practical automated data augmentation with a reduced search space. arXiv 1909.13719 (2019).
[76]
Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV’16), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 534–549.
[77]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 764–773.
[78]
Zihang Dai, Hanxiao Liu, Quoc V. Le, and Mingxing Tan. 2021. CoAtNet: Marrying Convolution and Attention for All Data Sizes. arXiv preprint arXiv:2106.04803 (2021).
[79]
Jim Davis, Tong Liang, James Enouen, and Roman Ilin. 2019. Hierarchical semantic labeling with adaptive confidence. In Proceedings of the International Symposium on Visual Computing. Springer, 169–183.
[80]
Bert De Brabandere, Davy Neven, and Luc Van Gool. 2017. Semantic instance segmentation with a discriminative loss function. In Proceedings of the Workshop on Deep Learning for Robotic Vision (CVPR’17). Retrieved from https://arxiv.org/abs/1708.02551
[81]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 4690–4699.
[82]
Dickson Despommier. 2010. The Vertical Farm: Feeding the World in the 21st Century. Macmillan.
[83]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 1810.04805 (2019).
[84]
Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. 2018. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. Adv. Neural Inf. Process. Syst. 31 (2018).
[85]
Philipe A. Dias, Amy Tabb, and Henry Medeiros. 2018. Multispecies fruit flower detection using a refined semantic segmentation network. IEEE Robot. Automat. Lett. 3, 4 (2018), 3003–3010.
[86]
Philipe A. Dias, Amy Tabb, and Henry Medeiros. 2018. Multispecies fruit flower detection using a refined semantic segmentation network. IEEE Robot. Automat. Lett. 3, 4 (2018), 3003–3010. DOI:
[87]
Zhipeng Ding, Xu Han, Peirong Liu, and Marc Niethammer. 2021. Local temperature scaling for probability calibration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21). 6889–6899.
[88]
Pelin Dogan, Boyang Li, Leonid Sigal, and Markus Gross. 2018. A Neural Multi-sequence Alignment TeCHnique (NeuMATCH). In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’18).
[89]
Tiago Domingues, Tomás Brandão, and João C. Ferreira. 2022. Machine learning for detection and prediction of crop diseases and pests: A comprehensive survey. Agriculture 12, 9 (2022), 1350.
[90]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations.
[91]
Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. 2019. CenterNet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19).
[92]
Tom Duckett, Simon Pearson, Simon Blackmore, Bruce Grieve, Wen-Hua Chen, Grzegorz Cielniak, Jason Cleaversmith, Jian Dai, Steve Davis, Charles Fox, Pål From, Ioannis Georgilas, Richie Gill, Iain Gould, Marc Hanheide, Alan Hunter, Fumiya Iida, Lyudmila Mihalyova, Samia Nefti-Meziani, Gerhard Neumann, Paolo Paoletti, Tony Pridmore, Dave Ross, Melvyn Smith, Martin Stoelen, Mark Swainson, Sam Wane, Peter Wilson, Isobel Wright, and Guang-Zhong Yang. 2018. Agricultural Robotics: The Future of Robotic Agriculture. arXiv preprint arXiv:1806.06762 (2018).
[93]
Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2009. Visualizing higher-layer features of a deep network. Univ. Montreal 1341, 3 (2009), 1.
[94]
Clement Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. 2013. Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35, 8 (2013), 1915–1929. DOI:
[95]
Konstantinos P. Ferentinos. 2018. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145 (2018), 311–318.
[96]
Roemi Fernández, Carlota Salinas, Héctor Montes, and Javier Sarria. 2014. Multisensory system for fruit harvesting robots. Experimental testing in natural scenarios and with different kinds of crops. Sensors 14, 12 (2014), 23885–23904.
[97]
Ruth Fong and Andrea Vedaldi. 2018. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8730–8738.
[98]
Alina Förster, Jens Behley, Jan Behmann, and Ribana Roscher. 2019. Hyperspectral plant disease forecasting using generative adversarial networks. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS’19). IEEE, 1793–1796.
[99]
Logan Frank, Christopher Wiegman, Jim Davis, and Scott Shearer. 2021. Confidence-driven hierarchical classification of cultivated plant stresses. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2503–2512.
[100]
Evan D. G. Fraser and Malcolm Campbell. 2019. Agriculture 5.0: Reconciling production with planetary health. One Earth 1, 3 (2019), 278–280.
[101]
Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD : Deconvolutional Single Shot Detector. arXiv Preprint 1701.06659 (2017).
[102]
LiMin Fu. 1991. Rule learning by searching on adapted nets. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 91. 590–595.
[103]
Alvaro Fuentes, Sook Yoon, Sang Cheol Kim, and Dong Sun Park. 2017. A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors 17, 9 (2017), 2022.
[104]
Pedro Galeano, Esdras Joseph, and Rosa E. Lillo. 2015. The Mahalanobis distance for functional data with applications to classification. Technometrics 57, 2 (2015), 281–291.
[105]
Damien Garreau and Dina Mardaoui. 2021. What does LIME really see in images? In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 3620–3629. Retrieved from https://proceedings.mlr.press/v139/garreau21a.html
[106]
Damien Garreau and Ulrike von Luxburg. 2020. Explaining the explainer: A first theoretical analysis of LIME. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 108), Silvia Chiappa and Roberto Calandra (Eds.). PMLR, 1287–1296. Retrieved from https://proceedings.mlr.press/v108/garreau20a.html
[107]
Yuanyue Ge, Ya Xiong, Gabriel Lins Tenorio, and Pål Johan From. 2019. Fruit localization and environment perception for strawberry harvesting robots. IEEE Access 7 (2019), 147642–147652. DOI:
[108]
Jordi Gené-Mola, Ricardo Sanz-Cortiella, Joan R. Rosell-Polo, Josep-Ramon Morros, Javier Ruiz-Hidalgo, Verónica Vilaplana, and Eduard Gregorio. 2020. Fruit detection and 3D location using instance segmentation neural networks and structure-from-motion photogrammetry. Comput. Electron. Agric. 169 (2020), 105165.
[109]
Daan de Geus, Panagiotis Meletis, Chenyang Lu, Xiaoxiao Wen, and Gijs Dubbelman. 2021. Part-aware Panoptic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 5481–5490. DOI:
[110]
Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15).
[111]
Juliana Freitas Santos Gomes and Fabiana Rodrigues Leta. 2012. Applications of computer vision techniques in the agriculture and food industry: A review. Eur. Food Res. Technol. 235 (2012), 989–1000.
[112]
Andrea Gomez-Zavaglia, Juan Carlos Mejuto, and Jesus Simal-Gandara. 2020. Mitigation of emerging implications of climate change on food production systems. Food Res. Int. 134 (2020), 109256.
[113]
Stephen Gould, Richard Fulton, and Daphne Koller. 2009. Decomposing a scene into geometric and semantically consistent regions. In Proceedings of the IEEE 12th International Conference on Computer Vision. 1–8. DOI:
[114]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2018. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv 1706.02677 (2018).
[115]
Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Counterfactual visual explanations. In Proceedings of the International Conference on Machine Learning. PMLR, 2376–2384.
[116]
Riccardo Gozzovelli, Benjamin Franchetti, Malik Bekmurat, and Fiora Pirri. 2021. Tip-burn stress detection of lettuce canopy grown in plant factories. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1259–1268.
[117]
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 1321–1330.
[118]
Xu Guo, Boyang Li, Han Yu, and Chunyan Miao. 2021. Latent-optimized adversarial neural transfer for sarcasm detection. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’21).
[119]
Xu Guo, Boyang Li, Han Yu, and Chunyan Miao. 2021. Latent-Optimized Adversarial Neural Transfer for Sarcasm Detection. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’21). Retrieved from http://www.boyangli.org/paper/XuGuo-NAACL-2021.pdf
[120]
Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem. 2022. Towards general purpose vision systems: An end-to-end task-agnostic vision-language architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16399–16409.
[121]
Felix G. Gustafson and Elnore Stoldt. 1936. Some relations between leaf area and fruit size in tomatoes. Plant Physiol. 11, 2 (1936), 445.
[122]
Salvador Gutiérrez, Alexander Wendel, and James Underwood. 2019. Ground based hyperspectral imaging for extensive mango yield estimation. Comput. Electron. Agric. 157 (2019), 126–135.
[123]
Md Tarek Habib, Md Ariful Islam Arif, Sumaita Binte Shorif, Mohammad Shorif Uddin, and Farruk Ahmed. 2021. Machine vision-based fruit and vegetable disease recognition: A review. Comput. Vis. Mach. Learn. Agric. (2021), 143–157.
[124]
Michael Halstead, Christopher McCool, Simon Denman, Tristan Perez, and Clinton Fookes. 2018. Fruit quantity and ripeness estimation using a robotic vision system. IEEE Robot. Automat. Lett. 3, 4 (2018), 2995–3002.
[125]
Tengda Han, Weidi Xie, and Andrew Zisserman. 2020. Self-supervised co-training for video representation learning. Adv. Neural Inf. Process. Syst. 33 (2020), 5679–5690.
[126]
Nicolai Häni, Pravakar Roy, and Volkan Isler. 2020. MinneApple: A benchmark dataset for apple detection and segmentation. IEEE Robot. Automat. Lett. 5, 2 (2020), 852–858.
[127]
X. Hao, X. Guo, J. Zheng, L. Celeste, S. Kholsa, and X. Chen. 2015. Response of greenhouse tomato to different vertical spectra of LED lighting under overhead high pressure sodium and plasma lighting. In Proceedings of the International Symposium on New Technologies and Management for Greenhouses (GreenSys’15). 1003–1110.
[128]
Xiuming Hao and Athanasios P. Papadopoulos. 1999. Effects of supplemental lighting and cover materials on growth, photosynthesis, biomass partitioning, early yield and quality of greenhouse cucumber. Scient. Hortic. 80, 1-2 (1999), 1–18.
[129]
Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous detection and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 297–312.
[130]
Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, and Dustin Tran. 2021. Training independent subnetworks for robust prediction. In Proceedings of the International Conference on Learning Representations (ICLR’21).
[131]
Zeeshan Hayder, Xuming He, and Mathieu Salzmann. 2016. Boundary-aware Instance Segmentation. DOI:
[132]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).
[133]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2018. Mask R-CNN. arXiv 1703.06870 (2018).
[134]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv 1512.03385 (2015).
[135]
Leilei He, Wentai Fang, Guanao Zhao, Zhenchao Wu, Longsheng Fu, Rui Li, Yaqoob Majeed, and Jaspreet Dhupia. 2022. Fruit yield prediction and estimation in orchards: A state-of-the-art comprehensive review for both direct and indirect methods. Comput. Electron. Agric. 195 (2022), 106812.
[136]
Katherine L. Hermann and Andrew K. Lampinen. 2020. What shapes feature representations? Exploring datasets, architectures, and training. arXiv2006.12433 (2020).
[137]
Yann N. Dauphin, David Lopez-Paz, Hongyi Zhang, and Moustapha Cisse. 2018. Mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=r1Ddp1-Rb
[138]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 1704.04861 (2017).
[139]
Chunhua Hu, Xuan Liu, Zhou Pan, and Pingping Li. 2019. Automatic detection of single ripe tomato on plant combining faster R-CNN and intuitionistic fuzzy set. IEEE Access 7 (2019), 154683–154696.
[140]
Xiaoping Huang, Zelin Hu, Xiaorun Wang, Xuanjiang Yang, Jian Zhang, and Daoling Shi. 2019. An improved single shot multibox detector method applied in body condition score for dairy cows. Animals 9, 7 (2019). DOI:
[141]
Yo-Ping Huang, Tzu-Hao Wang, and Haobijam Basanta. 2020. Using fuzzy mask R-CNN model to automatically identify tomato ripeness. IEEE Access 8 (2020), 207672–207682.
[142]
Wallace E. Huffman. 2012. The status of labor-saving mechanization in US fruit and vegetable harvesting. Choices 27, 316-2016-6262 (2012).
[143]
David Hughes, Marcel Salathé, et al. 2015. An Open Access Repository of Images on Plant Health to Enable the Development of Mobile Disease Diagnostics. arXiv preprint arXiv:1511.08060 (2015).
[144]
Eyke Hüllermeier and Willem Waegeman. 2021. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Mach. Learn. 110, 3 (2021), 457–506.
[145]
Sarah Ibrahimi, Arnaud Sors, Rafael Sampaio de Rezende, and Stéphane Clinchant. 2022. Learning with label noise for image retrieval by selecting interactions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2181–2190.
[146]
Zahid Iqbal, Muhammad Attique Khan, Muhammad Sharif, Jamal Hussain Shah, Muhammad Habib ur Rehman, and Kashif Javed. 2018. An automated detection and classification of citrus plant diseases using image processing techniques: A review. Comput. Electronics Agric. 153 (2018), 12–32.
[147]
David Ireri, Eisa Belal, Cedric Okinda, Nelson Makange, and Changying Ji. 2019. A computer vision system for defect discrimination and grading in tomatoes using machine learning and image processing. Artif. Intell. Agric. 2 (2019), 28–37.
[148]
Jörn-Henrik Jacobsen, Arnold Smeulders, and Edouard Oyallon. 2018. i-RevNet: Deep invertible networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).
[149]
Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, and Joao Carreira. 2021. Perceiver IO: A General Architecture for Structured Inputs & Outputs. arXiv preprint arXiv:2107.14795 (2021).
[150]
Lu Jiang, Mason Liu Di Huang, and Weilong Yang. 2020. Beyond synthetic noise: Deep learning on controlled noisy labels. In Proceedings of the International Conference on Machine Learning.
[151]
Vijay Kakani, Van Huan Nguyen, Basivi Praveen Kumar, Hakil Kim, and Visweswara Rao Pasupuleti. 2020. A critical review on computer vision and artificial intelligence in food industry. J. Agric. Food Res. 2 (2020), 100033.
[152]
Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, and Aniruddha Kembhavi. 2022. Webly Supervised Concept Expansion for General Purpose Vision Models. arXiv preprint arXiv:2202.02317 (2022).
[153]
Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, and Hiroki Arimura. 2020. DACE: Distribution-aware counterfactual explanation by mixed-integer linear optimization. In Proceedings of the International Joint Conference on Artificial Intelligence. 2855–2862.
[154]
Ramesh Kestur, Avadesh Meduri, and Omkar Narasipura. 2019. MangoNet: A deep semantic segmentation architecture for a method to detect and count mangoes in an open orchard. Eng. Applic. Artif. Intell. 77 (2019), 59–69.
[155]
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. 2019. Panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).
[156]
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In Proceedings of the International Conference on Machine Learning. PMLR, 1885–1894.
[157]
Anand Koirala, K. B. Walsh, Zhenglin Wang, and C. McCarthy. 2019. Deep learning for real-time fruit detection and orchard fruit load estimation: Benchmarking of ‘‘MangoYOLO.’’ Precis. Agric. 20, 6 (2019), 1107–1135.
[158]
Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Lei Li, and Jianbo Shi. 2020. FoveaBox: Beyound anchor-based object detection. IEEE Trans. Image Process. 29 (2020), 7389–7398. DOI:
[159]
Dean A. Kopsell, Carl E. Sams, and Robert C. Morrow. 2015. Blue wavelengths from LED lighting increase nutritionally important metabolites in specialty crops. HortScience 50, 9 (2015), 1285–1288.
[160]
Maxim S. Kovalev, Lev V. Utkin, and Ernest M. Kasimov. 2020. SurvLIME: A method for explaining machine learning survival models. Knowl.-based Syst. 203 (2020), 106164. DOI:
[161]
R. Krishnamurthy. 2014. Vertical farming: Singapore’s solution to feed the local urban Population. Permacult. Res. Instit. (2014). https://www.permaculturenews.org/2014/07/25/vertical-farming-singapores-solution-feed-local-urban-population/
[162]
Alex Krizhevsky. 2014. One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997 (2014).
[163]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12). Curran Associates Inc., Red Hook, NY, 1097–1105.
[164]
Anders Krogh Mortensen, Soren Skovsen, Henrik Karstoft, and Rene Gislum. 2019. The oil radish growth dataset for semantic segmentation and yield estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19) Workshops.
[165]
Ferhat Kurtulmus, Won Suk Lee, and Ali Vardar. 2014. Immature peach detection in colour images acquired in natural illumination conditions using statistical classifiers and neural network. Precis. Agric. 15, 1 (2014), 57–79.
[166]
Keerthy Kusumam, Tomáš Krajník, Simon Pearson, Tom Duckett, and Grzegorz Cielniak. 2017. 3D-vision based detection, localization, and sizing of broccoli heads in the field. J. Field Robot. 34, 8 (2017), 1505–1518.
[167]
L’ubor Ladický, Chris Russell, Pushmeet Kohli, and Philip H. S. Torr. 2009. Associative hierarchical CRFs for object class image segmentation. In Proceedings of the IEEE 12th International Conference on Computer Vision. 739–746. DOI:
[168]
Dana Lahat, Tülay Adali, and Christian Jutten. 2015. Multimodal data fusion: An overview of methods, challenges, and prospects. Proc. IEEE 103, 9 (2015), 1449–1477.
[169]
Peter D. Lancashire, Hermann Bleiholder, T van den Boom, P. Langelüddeke, Reinhold Stauss, Elfriede Weber, and A. Witzenberger. 1991. A uniform decimal code for growth stages of crops and weeds. Ann. Appl. Biol. 119, 3 (1991), 561–601.
[170]
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2017. FractalNet: Ultra-deep neural networks without residuals. In Proceedings of the International Conference on Learning Representations (ICLR’17).
[171]
Hei Law and Jia Deng. 2018. CornerNet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV’18).
[172]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (2015), 436–444.
[173]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324. DOI:
[174]
Joon-Woo Lee, Taewon Moon, and Jung-Eek Son. 2021. Development of growth estimation algorithms for hydroponic bell peppers using recurrent neural networks. Horticulturae 7, 9 (2021). DOI:
[175]
Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predictions. arXiv preprint arXiv:1606.04155 (2016).
[176]
Han Li, Won Suk Lee, and Ku Wang. 2016. Immature green citrus fruit detection and counting based on fast normalized cross correlation (FNCC) using natural outdoor colour images. Precis. Agric. 17, 6 (2016), 678–697.
[177]
Kai Li, Martin Renqiang Min, and Yun Fu. 2019. Rethinking zero-shot learning: A conditional visual classification perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3583–3592.
[178]
Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, and Xingang Wang. 2019. Attention-guided unified network for panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 7019–7028. DOI:
[179]
Yuanzhi Li and Yingyu Liang. 2018. Learning overparameterized neural networks via stochastic gradient descent on structured data. In Adv. Neural Inf. Process. Syst., S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2018/file/54fe976ba170c19ebae453679b362263-Paper.pdf
[180]
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2017. Fully convolutional instance-aware semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
[181]
Yanfen Li, Hanxiang Wang, L Minh Dang, Abolghasem Sadeghi-Niaraki, and Hyeonjoon Moon. 2020. Crop pest recognition in natural scenes using convolutional neural networks. Comput. Electron. Agric. 169 (2020), 105174.
[182]
Yang Li and Jiachen Yang. 2021. Meta-learning baselines and database for few-shot classification in agriculture. Comput. Electron. Agric. 182 (2021), 106055.
[183]
Zhizhong Li and Derek Hoiem. 2020. Improving confidence estimates for unfamiliar examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 2683–2692. DOI:
[184]
Guichao Lin, Yunchao Tang, Xiangjun Zou, Juntao Xiong, and Yamei Fang. 2020. Color-, depth-, and shape-based 3D fruit detection. Precis. Agric. 21, 1 (2020), 1–17.
[185]
Guichao Lin, Yunchao Tang, Xiangjun Zou, Juntao Xiong, and Jinhui Li. 2019. Guava detection and pose estimation using a low-cost RGB-D sensor in the field. Sensors 19, 2 (2019), 428.
[186]
Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
[187]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).
[188]
Chang Liu, Han Yu, Boyang Li, Zhiqi Shen, Zhanning Gao, Peiran Ren, Xuansong Xie, Lizhen Cui, and Chunyan Miao. 2021. Noise-resistant deep metric learning with ranking-based instance selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). Retrieved from http://www.boyangli.org/paper/ChangLiu-CVPR-2021.pdf
[189]
Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, and Wei Jiang. 2019. An end-to-end network for panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 6165–6174. DOI:
[190]
Jialun Liu, Yifan Sun, Chuchu Han, Zhaopeng Dou, and Wenhui Li. 2020. Deep representation learning on long-tailed data: A learnable embedding augmentation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 2970–2979.
[191]
Qiuhua Liu, Xuejun Liao, and Lawrence Carin. 2007. Semi-supervised multitask learning. In Advances in Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweis (Eds.), Vol. 20. Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2007/file/a34bacf839b923770b2c360eefa26748-Paper.pdf
[192]
Tian-Hu Liu, Reza Ehsani, Arash Toudeshki, Xiang-Jun Zou, and Hong-Jun Wang. 2018. Detection of citrus fruit and tree trunks in natural environments using a multi-elliptical boundary model. Comput. Industr. 99 (2018), 9–16.
[193]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multiBox detector. In Proceedings of the European Conference on Computer Vision (ECCV’16), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). 21–37.
[194]
Yang Liu, Duolin Wang, Fei He, Juexin Wang, Trupti Joshi, and Dong Xu. 2019. Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean. Front. Genet. 10 (2019), 1091. DOI:
[195]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
[196]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’15). 3431–3440.
[197]
Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, and Aniruddha Kembhavi. 2022. Unified-IO: A Unified Model for Vision, Language, and Multi-modal Tasks. arXiv preprint arXiv:2206.08916 (2022).
[198]
Jie-Yan Lu, Chung-Liang Chang, and Yan-Fu Kuo. 2019. Monitoring growth rate of lettuce using deep convolutional neural networks. In Proceedings of the ASABE Annual International Meeting. American Society of Agricultural and Biological Engineers.
[199]
Shilei Lyu, Yawen Zhao, Ruiyao Li, Zhen Li, Renjie Fan, and Qiafeng Li. 2022. Embedded sensing system for recognizing citrus flowers using cascaded fusion YOLOv4-CF+ FPGA. Sensors 22, 3 (2022), 1255.
[200]
Juncheng Ma, Keming Du, Lingxian Zhang, Feixiang Zheng, Jinxiang Chu, and Zhongfu Sun. 2017. A segmentation method for greenhouse vegetable foliar disease spots images using color information and region growing. Comput. Electron. Agric. 142 (2017), 110–117.
[201]
Juncheng Ma, Keming Du, Feixiang Zheng, Lingxian Zhang, Zhihong Gong, and Zhongfu Sun. 2018. A recognition method for cucumber diseases using leaf symptom images based on deep convolutional neural network. Comput. Electron. Agric. 154 (2018), 18–24.
[202]
Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5188–5196.
[203]
Alireza Mehrtash, William M. Wells, Clare M. Tempany, Purang Abolmaesumi, and Tina Kapur. 2020. Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. IEEE Trans. Medic. Imag. 39, 12 (2020), 3868–3878. DOI:
[204]
Massimo Minervini, Andreas Fischbach, Hanno Scharr, and Sotirios A. Tsaftaris. 2016. Finely-grained annotated datasets for image-based plant phenotyping. Pattern Recog. Lett. 81 (2016), 80–89.
[205]
Lj Miranda. 2021. Towards Data-centric Machine Learning: A Short Review. Retrieved from https://ljvmiranda921.github.io/notebook/2021/07/30/data-centric-ml/
[206]
Christoph Molnar. 2020. Interpretable Machine Learning. Retrieved from https://christophm.github.io/interpretable-ml-book/
[207]
Grégoire Montavon, Alexander Binder, Sebastian Lapuschkin, Wojciech Samek, and Klaus-Robert Müller. 2019. Layer-wise relevance propagation: An overview. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (2019), 193–209.
[208]
Jednipat Moonrinta, Supawadee Chaivivatrakul, Matthew N. Dailey, and Mongkol Ekpanyapong. 2010. Fruit detection, tracking, and 3D reconstruction for crop mapping and yield estimation. In Proceedings of the 11th International Conference on Control Automation Robotics & Vision. IEEE, 1181–1186.
[209]
Alexander Mordvintsev, Christopher Olah, and Mike Tyka. 2015. Inceptionism: Going deeper into neural networks. (2015). https://blog.research.google/2015/06/inceptionism-going-deeper-into-neural.html?m=1
[210]
Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip Torr, and Puneet Dokania. 2020. Calibrating deep neural networks using focal loss. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 15288–15299. Retrieved from https://proceedings.neurips.cc/paper/2020/file/aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf
[211]
Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton. 2019. When does label smoothing help? In Advances in Neural Information Processing Systems., H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf
[212]
Joanna M. Nassar, Sherjeel M. Khan, Diego Rosas Villalva, Maha M. Nour, Amani S. Almuslem, and Muhammad M. Hussain. 2018. Compliant plant wearables for localized microclimate and plant growth monitoring. npj Flex. Electron. 2, 1 (2018), 1–12.
[213]
Davy Neven, Bert De Brabandere, Marc Proesmans, and Luc Van Gool. 2019. Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 8829–8837. DOI:
[214]
Canh Nguyen, Vasit Sagan, Matthew Maimaitiyiming, Maitiniyazi Maimaitijiang, Sourav Bhadra, and Misha T. Kwasniewski. 2021. Early detection of plant viral disease using hyperspectral imaging and deep learning. Sensors 21, 3 (2021). DOI:
[215]
Xueping Ni, Changying Li, Huanyu Jiang, and Fumiomi Takeda. 2020. Deep learning image segmentation and extraction of blueberry fruit traits associated with harvestability and yield. Hortic. Res. 7 (2020).
[216]
Sai Vidyaranya Nuthalapati and Anirudh Tunga. 2021. Multi-domain few-shot learning and dataset for agricultural applications. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1399–1408.
[217]
Emmanuel Karlo Nyarko, Ivan Vidović, Kristijan Radočaj, and Robert Cupec. 2018. A nearest neighbor approach for fruit recognition in RGB-D images based on detection of convex surfaces. Expert Syst. Applic. 114 (2018), 454–466.
[218]
Maxime Oquab, Léon Bottou, Ivan Laptev, and Josef Sivic. 2015. Is object localization for free? Weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 685–694.
[219]
Ahmad Ostovar, Ola Ringdahl, and Thomas Hellström. 2018. Adaptive image thresholding of yellow peppers for a harvesting robot. Robotics 7, 1 (2018), 11.
[220]
Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE Trans. Sys., Man, Cybern. 9, 1 (1979), 62–66.
[221]
Christian Payer, Darko Štern, Thomas Neff, Horst Bischof, and Martin Urschler. 2018. Instance segmentation and tracking with cosine embeddings and recurrent hourglass networks. In Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention.
[222]
Tejaswini Pedapati, Avinash Balakrishnan, Karthikeyan Shanmugam, and Amit Dhurandhar. 2020. Learning global transparent models consistent with local contrastive explanations. Adv. Neural Inf. Process. Syst. 33 (2020), 3592–3602.
[223]
Yao Peng, Mary M. Dallas, José T. Ascencio-Ibáñez, J. Steen Hoyer, James Legg, Linda Hanley-Bowdoin, Bruce Grieve, and Hujun Yin. 2022. Early detection of plant virus infection using multispectral imaging and spatial–spectral machine learning. Scient. Rep. 12, 1 (2022), 3113.
[224]
Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey Hinton. 2017. Regularizing Neural Networks by Penalizing Confident Output Distributions. arXiv 1701.06548 (2017).
[225]
Wilhelm Pfeffer. 1900. The Physiology of Plants: A Treatise upon the Metabolism and Sources of Energy in Plants. Vol. 1. Clarendon Press.
[226]
Pedro O. Pinheiro and Ronan Collobert. 2015. Learning to segment object candidates. In Proceedings of the 28th International Conference on Neural Information Processing Systems. 1990–1998.
[227]
Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. In Proceedings of the European Conference on Computer Vision (ECCV’16), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 75–91.
[228]
Przemyslaw Prusinkiewicz. 2002. Art and science of life: Designing and growing virtual plants with L-systems. In Proceedings of the International Horticultural Congress: Nursery Crops; Development, Evaluation, Production and Use. 15–28.
[229]
Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. 2020. Estimating training data influence by tracing gradient descent. Adv. Neural Inf. Process. Syst. 33 (2020), 19920–19930.
[230]
Zhongang Qi, Saeed Khorram, and Li Fuxin. 2021. Embedding deep networks into visual explanations. Artif. Intell. 292 (2021), 103435.
[231]
Redmond R. Shamshiri, Cornelia Weltzien, Ibrahim A. Hameed, Ian J. Yule, Tony E. Grift, Siva K. Balasundram, Lenka Pitonakova, Desa Ahmad, and Girish Chowdhary. 2018. Research and development in agricultural robotics: A perspective of digital farming. https://blog.research.google/2015/06/inceptionism-going-deeper-into-neural.html?m=1
[232]
K. Ragazou, A. Garefalakis, E. Zafeiriou, and I. Passas. 2022. Agriculture 5.0: A new strategic management mode for a cut cost and an energy efficient agriculture sector. Energies 2022, 15 (2022), 3113.
[233]
Parastoo Rahimi, Md Saiful Islam, Phelipe Magalhães Duarte, Sina Salajegheh Tazerji, Md Abdus Sobur, Mohamed E. El Zowalaty, Hossam M. Ashour, and Md Tanvir Rahman. 2022. Impact of the COVID-19 pandemic on food production and animal health. Trends in Food Science & Technology 121 (2022), 105–113. https://www.sciencedirect.com/science/article/pii/S0924224421006609
[234]
Maryam Rahnemoonfar and Clay Sheppard. 2017. Deep count: Fruit counting based on deep simulated learning. Sensors 17, 4 (2017), 905.
[235]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-time Object Detection. arXiv 1506.02640 (2016).
[236]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
[237]
Tanzeel U. Rehman, Md Sultan Mahmud, Young K. Chang, Jian Jin, and Jaemyung Shin. 2019. Current and future applications of statistical machine learning algorithms for agricultural machine vision systems. Comput. Electron. Agric. 156 (2019), 585–605.
[238]
Mengye Ren and Richard S. Zemel. 2017. End-to-end instance segmentation with recurrent attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6656–6664.
[239]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. Curran Associates, Inc.
[240]
A. Reyes-Yanes, P. Martinez, and R. Ahmad. 2020. Real-time growth rate and fresh weight estimation for little gem romaine lettuce in aquaponic grow beds. Comput. Electron. Agric. 179 (2020), 105827. DOI:
[241]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135–1144.
[242]
Joe M. Roberts, Toby J. A. Bruce, James M. Monaghan, Tom W. Pope, Simon R. Leather, and Andrew M. Beacham. 2020. Vertical farming systems bring new considerations for pest and disease management. Ann. Appl. Biol. 176, 3 (2020), 226–232.
[243]
David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Sasha Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla P. Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, and Yoshua Bengio. 2022. Tackling climate change with machine learning. ACM Comput. Surv. 55, 2, Article 42 (Feb. 2022), 96 pages. DOI:
[244]
Eduardo Romera, José M. Alvarez, Luis M. Bergasa, and Roberto Arroyo. 2017. ERFNet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transport. Syst. 19, 1 (2017), 263–272.
[245]
Bernardino Romera-Paredes and Philip Hilaire Sean Torr. 2016. Recurrent instance segmentation. In Proceedings of the European Conference on Computer Vision. Springer, 312–329.
[246]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’15), Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234–241.
[247]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211–252. DOI:
[248]
Radu Bogdan Rusu. 2010. Semantic 3D object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24, 4 (2010), 345–348.
[249]
Inkyu Sa, Zongyuan Ge, Feras Dayoub, Ben Upcroft, Tristan Perez, and Chris McCool. 2016. DeepFruits: A fruit detection system using deep neural networks. Sensors 16, 8 (2016), 1222.
[250]
Verónica Saiz-Rubio and Francisco Rovira-Más. 2020. From smart farming towards agriculture 5.0: A review on crop data management. Agronomy 10, 2 (2020), 207.
[251]
Amaia Salvador, Miriam Bellver, Victor Campos, Manel Baradad, Ferran Marques, Jordi Torres, and Xavier Giro-i Nieto. 2017. Recurrent Neural Networks for Semantic Instance Segmentation. arXiv Preprint 1712.00617 (2017).
[252]
Hanno Scharr, Massimo Minervini, Andrew P. French, Christian Klukas, David M. Kramer, Xiaoming Liu, Imanol Luengo, Jean-Michel Pape, Gerrit Polder, Danijela Vukadinovic, Xi Yin, and Sotirios A. Tsaftaris. 2016. Leaf segmentation in plant phenotyping: A collation study. Mach. Vis. Applic. 27, 4 (2016), 585–606.
[253]
Lars Schmarje, Monty Santarossa, Simon-Martin Schröder, and Reinhard Koch. 2021. A survey on semi-, self- and unsupervised learning for image classification. IEEE Access 9 (2021), 82146–82168.
[254]
Michael Gomez Selvaraj, Alejandro Vergara, Henry Ruiz, Nancy Safari, Sivalingam Elayabalan, Walter Ocimati, and Guy Blomme. 2019. AI-powered banana diseases and pest detection. Plant Meth. 15, 1 (2019), 1–11.
[255]
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-Cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.
[256]
Woo Chaw Seng and Seyed Hadi Mirisaee. 2009. A new method for fruits recognition system. In Proceedings of the International Conference on Electrical Engineering and Informatics, Vol. 1. IEEE, 130–134.
[257]
Jayavelu Senthilnath, Akanksha Dokania, Manasa Kandukuri, K. N. Ramesh, Gautham Anand, and S. N. Omkar. 2016. Detection of tomatoes using spectral-spatial methods in remotely sensed RGB images captured by UAV. Biosyst. Eng. 146 (2016), 16–32.
[258]
Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. 2013. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013).
[259]
Lei Sha, Oana-Maria Camburu, and Thomas Lukasiewicz. 2021. Learning from the best: Rationalizing predictions by adversarial information calibration. In Proceedings of the AAAI Conference on Artificial Intelligence. 13771–13779.
[260]
Manish Sharma, Mayur Dhanaraj, Srivallabha Karnam, Dimitris G. Chachlakis, Raymond Ptucha, Panos P. Markopoulos, and Eli Saber. 2020. YOLOrs: Object detection in multimodal remote sensing imagery. IEEE J. Select. Topics Appl. Earth Observ. Rem. Sens. 14 (2020), 1497–1508.
[261]
Li Shen, Zhouchen Lin, and Qingming Huang. 2016. Relay backpropagation for effective learning of deep convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, 467–482.
[262]
Rui Shi, Tianxing Li, and Yasushi Yamaguchi. 2020. An attribution-based pruning method for real-time mango detection with YOLO network. Comput. Electron. Agric. 169 (2020), 105214.
[263]
Ruifeng Shi, Deming Zhai, Xianming Liu, Junjun Jiang, and Wen Gao. 2020. Rectified Meta-learning from Noisy Labels for Robust Image-based Plant Disease Diagnosis. arXiv preprint arXiv:2003.07603 (2020).
[264]
Shigeharu Shimamura. n.d. Indoor Cultivation for the Future. Retrieved from https://frc.ri.cmu.edu/ssingh/VF/Challenges_in_Vertical_Farming/Schedule_files/SHIMAMURA.pdf
[265]
Vivswan Shitole, Fuxin Li, Minsuk Kahng, Prasad Tadepalli, and Alan Fern. 2021. One explanation is not enough: Structured attention graphs for image classification. Adv. Neural Inf. Process. Syst. 34 (2021), 11352–11363.
[266]
Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, and Anshul Kundaje. 2016. Not just a black box: Interpretable deep learning by propagating activation differences. arXiv preprint arXiv:1605.01713 4 (2016).
[267]
S. S. Chouhan, U. P. Singh, A. Kaul, and S. Jain. 2019. A data repository of leaf images: practice towards plant conservation with plant pathology. In 2019 4th International Conference on Information Systems and Computer Networks (ISCON), Mathura, India, 700–707. DOI:
[268]
Aleksandra Sidor and Piotr Rzymski. 2020. Dietary choices and habits during COVID-19 lockdown: Experience from Poland. Nutrients 12, 6 (2020), 1657.
[269]
Claudênia Ferreira da Silva, Carlos Hidemi Uesugi, Luiz Eduardo Bassay Blum, Abi Soares dos Anjos Marques, and Marisa Álvares da Silva Velloso Ferreira. 2016. Molecular detection of Erwinia psidii in guava plants under greenhouse and field conditions. Ciência Rural 46 (2016), 1528–1534.
[270]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. 2017. Mastering the game of Go without human knowledge. Nature 550 (2017), 354–359.
[271]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.
[272]
Uday Pratap Singh, Siddharth Singh Chouhan, Sukirty Jain, and Sanjeev Jain. 2019. Multilayer convolution neural network for the classification of mango leaves infected by anthracnose disease. IEEE Access 7 (2019), 43721–43729.
[273]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. SmoothGrad: Removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
[274]
Samuel L. Smith, Benoit Dherin, David G. T. Barrett, and Soham De. 2021. On the Origin of Implicit Regularization in Stochastic Gradient Descent. arXiv preprint arXiv:2101.12176 (2021).
[275]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 30 (2017).
[276]
Jie Song, Chengchao Shen, Jie Lei, An-Xiang Zeng, Kairi Ou, Dacheng Tao, and Mingli Song. 2018. Selective zero-shot classification with augmented attributes. In Proceedings of the European Conference on Computer Vision (ECCV’18). 468–483.
[277]
Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, and Mingli Song. 2018. Transductive unbiased embedding for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1024–1033.
[278]
K. Sornalakshmi, G. Sujatha, S. Sindhu, and D. Hemavathi. 2022. A technical survey on deep learning and AI solutions for plant quality and health indicators monitoring in agriculture. In Proceedings of the 3rd International Conference on Smart Electronics and Communication (ICOSEC’22). IEEE, 984–988.
[279]
Edgar P. Spalding and Nathan D. Miller. 2013. Image analysis is driving a renaissance in growth measurement. Curr. Opin. Plant Biol. 16, 1 (2013), 100–104.
[280]
Trevor Standley, Amir Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2020. Which tasks should be learned together in multi-task learning? In Proceedings of the International Conference on Machine Learning. PMLR, 9120–9132.
[281]
A. Steiner, A. Kolesnikov, X. Zhai, R. Wightman, J. Uszkoreit, and L. Beyer. 2021. How to train your vit? Data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270.
[282]
Richard N. Strange and Peter R. Scott. 2005. Plant disease: A threat to global food security. Ann. Rev. Phytopathol. 43, 1 (2005), 83–116.
[283]
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).
[284]
Jun Sun, Xiaofei He, Xiao Ge, Xiaohong Wu, Jifeng Shen, and Yingying Song. 2018. Detection of key organs in tomato based on deep migration learning in a complex background. Agriculture 8, 12 (2018), 196.
[285]
Kaiqiong Sun, Xuan Wang, Shoushuai Liu, and ChangHua Liu. 2021. Apple, peach, and pear flower detection using semantic segmentation network and shape constraint level set. Comput. Electron. Agric. 185 (2021), 106150.
[286]
Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-transfer learning for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 403–412.
[287]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning. PMLR, 3319–3328.
[288]
Nik Susič, Uroš Žibrat, Saša Širca, Polona Strajnar, Jaka Razinger, Matej Knapič, Andrej Vončina, Gregor Urek, and Barbara Gerič Stare. 2018. Discrimination between abiotic and biotic drought stress in tomatoes using hyperspectral imaging. Sensors actuators B: Chem. 273 (2018), 842–852.
[289]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2818–2826. DOI:
[290]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[291]
Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford, and Peter Corke. 2018. The limits and potentials of deep learning for robotics. Int. J. Robot. Res. 37, 4-5 (2018), 405–420.
[292]
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105–6114. Retrieved from https://proceedings.mlr.press/v97/tan19a.html
[293]
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning.
[294]
Wenzhi Tang, Tingting Yan, Fei Wang, Jingxian Yang, Jian Wu, Jianlong Wang, Tianli Yue, and Zhonghong Li. 2019. Rapid fabrication of wearable carbon nanotube/graphite strain sensor for real-time monitoring of plant growth. Carbon 147 (2019), 295–302.
[295]
Nima Teimouri, Mads Dyrmann, Per Rydahl Nielsen, Solvejg Kopp Mathiassen, Gayle J. Somerville, and Rasmus Nyholm Jørgensen. 2018. Weed growth stage estimator using deep convolutional neural networks. Sensors 18, 5 (2018). Retrieved from http://www.mdpi.com/1424-8220/18/5/1580
[296]
Mercè Teixidó, Davinia Font, Tomàs Pallejà, Marcel Tresanchez, Miquel Nogués, and Jordi Palacín. 2012. Definition of linear color models in the RGB vector color space to detect red peaches in orchard images taken under natural illumination. Sensors 12, 6 (2012), 7701–7718.
[297]
Sunil Thulasidasan, Gopinath Chennupati, Jeff A. Bilmes, Tanmoy Bhattacharya, and Sarah Michalak. 2019. On mixup training: Improved calibration and predictive uncertainty for deep neural networks. In Advances in Neural Information Processing Systems., H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2019/file/36ad8b5f42db492827016448975cc22d-Paper.pdf
[298]
Hongkun Tian, Tianhai Wang, Yadong Liu, Xi Qiao, and Yanzhou Li. 2020. Computer vision technology in agricultural automation–A review. Inf. Process. Agric. 7, 1 (2020), 1–19.
[299]
Mengxiao Tian, Hao Guo, Hong Chen, Qing Wang, Chengjiang Long, and Yuhao Ma. 2019. Automated pig counting using deep learning. Comput. Electron. Agric. 163 (2019), 104840. DOI:
[300]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully Convolutional One-Stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19).
[301]
Julio Torres-Tello and Seok-Bum Ko. 2022. Optimizing a multispectral-images-based DL model, through feature selection, pruning and quantization. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’22). IEEE, 1352–1356.
[302]
Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, and Hervé Jégou. 2021. Going Deeper With Image Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21). 32–42.
[303]
Mukesh Kumar Tripathi and Dhananjay D. Maktedar. 2020. A role of computer vision in fruits and vegetables among various horticulture products of agriculture fields: A survey. Inf. Process. Agric. 7, 2 (2020), 183–203.
[304]
Magdalena Trojak, Ernest Skowron, Tomasz Sobala, Maciej Kocurek, and Jan Pałyga. 2022. Effects of partial replacement of red by green light in the growth spectrum on photomorphogenesis and photosynthesis in tomato plants. Photosynth. Res. 151, 3 (2022), 295–312.
[305]
Jordan Ubbens, Mikolaj Cieslak, Przemyslaw Prusinkiewicz, and Ian Stavness. 2018. The use of plant models in deep learning: An application to leaf counting in rosette plants. Plant Meth. 14, 1 (2018), 1–10.
[306]
Nathalie van Wijkvliet. n.d. No space, no problem. How Singapore is turning into an edible paradise. Retrieved 11th November 2022 from https://sustainableurbandelta.com/singapore-30-by-30-food-system/
[307]
Charles Veys, Fokion Chatziavgerinos, Ali AlSuwaidi, James Hibbert, Mark Hansen, Gytis Bernotas, Melvyn Smith, Hujun Yin, Stephen Rolfe, and Bruce Grieve. 2019. Multispectral imaging for presymptomatic analysis of light leaf spot in oilseed rape. Plant Meth. 15 (2019), 1–12.
[308]
Adar Vit and Guy Shani. 2018. Comparing RGB-D sensors for close range outdoor agricultural phenotyping. Sensors 18, 12 (2018), 4413.
[309]
Fabio Vulpi, Roberto Marani, Antonio Petitti, Giulio Reina, and Annalisa Milella. 2022. An RGB-D multi-view perspective for autonomous agricultural robots. Comput. Electron. Agric. 202 (2022), 107419.
[310]
Dong Wang and Xiaoyang Tan. 2017. Robust distance metric learning via Bayesian inference. IEEE Trans. Image Process. 27, 3 (2017), 1542–1553.
[311]
Dongyi Wang, Robert Vinson, Maxwell Holmes, Gary Seibel, Avital Bechar, Shimon Nof, and Yang Tao. 2019. Early detection of tomato spotted wilt virus by hyperspectral imaging and outlier removal auxiliary classifier generative adversarial nets (OR-AC-GAN). Scient. Rep. 9, 1 (2019), 1–14.
[312]
Deng-Bao Wang, Lei Feng, and Min-Ling Zhang. 2021. Rethinking calibration of deep neural networks: Do not be afraid of overconfidence. In Advances in Neural Information Processing Systems., M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 11809–11820. Retrieved from https://proceedings.neurips.cc/paper/2021/file/61f3a6dbc9120ea78ef75544826c814e-Paper.pdf
[313]
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. CosFace: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 5265–5274.
[314]
Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. 2022. Unifying Architectures, Tasks, and Modalities through a Simple Sequence-to-sequence learning Framework. arXiv preprint arXiv:2202.03052 (2022).
[315]
Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 11531–11539. DOI:
[316]
Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, and Lei Li. 2020. SOLO: Segmenting objects by locations. In Proceedings of the European Conference on Computer Vision (ECCV’ 2020), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 649–665.
[317]
Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, and Chunhua Shen. 2020. SOLOv2: Dynamic and fast instance segmentation. In Advances in Neural Information Processing Systems., H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 17721–17732. Retrieved from https://proceedings.neurips.cc/paper/2020/file/cd3afef9b8b89558cd56638c3631868a-Paper.pdf
[318]
Yulong Wang, Hang Su, Bo Zhang, and Xiaolin Hu. 2018. Interpret neural networks by identifying critical data routing paths. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8906–8914.
[319]
Yaqing Wang, Quanming Yao, James T. Kwok, and Lionel M. Ni. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. 53, 3 (2020), 1–34.
[320]
Zhenglin Wang, Kerry Walsh, and Anand Koirala. 2019. Mango fruit load estimation using a video based MangoYOLO—Kalman filter—Hungarian algorithm method. Sensors 19, 12 (2019), 2742.
[321]
Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. 2020. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 13726–13735.
[322]
Xiangqin Wei, Kun Jia, Jinhui Lan, Yuwei Li, Yiliang Zeng, and Chunmei Wang. 2014. Automatic method of fruit object extraction under complex agricultural background for vision system of fruit picking robot. Optik 125, 19 (2014), 5684–5689.
[323]
Yeming Wen, Dustin Tran, and Jimmy Ba. 2020. BatchEnsemble: An alternative approach to efficient ensemble and lifelong learning. In Proceedings of the International Conference on Learning Representations (ICLR’20).
[324]
Jan Weyler, Federico Magistri, Peter Seitz, Jens Behley, and Cyrill Stachniss. 2022. In-field phenotyping based on crop leaf and plant instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2725–2734.
[325]
Steven Euijong Whang, Yuji Roh, Hwanjun Song, and Jae-Gil Lee. 2021. Data Collection and Quality Challenges in Deep Learning: A Data-centric AI Perspective. (2021). DOI:
[326]
Adrian Wolny, Qin Yu, Constantin Pape, and Anna Kreshuk. 2022. Sparse object-level supervision for instance segmentation with pixel embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 4402–4411.
[327]
Adrian Wolny, Qin Yu, Constantin Pape, and Anna Kreshuk. 2022. Sparse object-level supervision for instance segmentation with pixel embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4402–4411.
[328]
Arissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc Le, Yang You, and Sameer Kumar. 2021. Training EfficientNets at supercomputer scale: 83% ImageNet Top-1 accuracy in one hour. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’21). 947–950.
[329]
Jingui Wu, Baohua Zhang, Jun Zhou, Yingjun Xiong, Baoxing Gu, and Xiaolong Yang. 2019. Automatic recognition of ripening tomatoes by combining multi-feature fusion with a bi-layer classification strategy for harvesting robots. Sensors 19, 3 (2019), 612.
[330]
Xiaoping Wu, Chi Zhan, Yu-Kun Lai, Ming-Ming Cheng, and Jufeng Yang. 2019. Ip102: A large-scale benchmark dataset for insect pest recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8787–8796.
[331]
Yuli Wu, Long Chen, and Dorit Merhof. 2020. Improving pixel embedding learning through intermediate distance regression supervision for instance segmentation. In Proceedings of the European Conference on Computer Vision Workshop. Springer, 213–227.
[332]
Yan Wu, Jeff Donahue, David Balduzzi, Karen Simonyan, and Timothy Lillicrap. 2019. LOGAN: Latent Optimisation for Generative Adversarial Networks. arXiv preprint arXiv:1912.00953 (2019).
[333]
Adnelba Vitória Oliveira Xavier, Geovani Soares de Lima, Hans Raj Gheyi, André Alisson Rodrigues da Silva, Lauriane Almeida dos Anjos Soares, and Cassiano Nogueira de Lacerda. 2022. Gas exchange, growth and quality of guava seedlings under salt stress and salicylic acid. Revista Ambiente & Água 17 (2022).
[334]
Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, and Bernt Schiele. 2016. Latent embeddings for zero-shot classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 69–77.
[335]
Jingjing Xie, Bing Xu, and Zhang Chuang. 2013. Horizontal and Vertical Ensemble with Deep Representation for Classification. arXiv 1306.2759 (2013).
[336]
Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2016. Aggregated Residual Transformations for Deep Neural Networks. arXiv Preprint 1611.05431 (2016).
[337]
Haotian Yan, Zhe Li, Weijian Li, Changhu Wang, Ming Wu, and Chuang Zhang. 2021. ConTNet: Why Not Use Convolution and Transformer at the Same Time? arXiv Preprint 2104.13497 (2021).
[338]
Biyun Yang and Yong Xu. 2021. Applications of deep-learning approaches in horticultural research: A review. Hortic. Res. 8 (2021).
[339]
Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, and Lijuan Wang. 2022. An empirical study of GPT-3 for few-shot knowledge-based VQA. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3081–3089.
[340]
Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I. Inouye, and Pradeep K. Ravikumar. 2019. On the (in) fidelity and sensitivity of explanations. Adv. Neural Inf. Process. Syst. 32 (2019).
[341]
Chih-Kuan Yeh, Joon Kim, Ian En-Hsu Yen, and Pradeep K. Ravikumar. 2018. Representer point selection for explaining deep neural networks. Adv. Neural Inf. Process. Syst. 31 (2018).
[342]
T. Yeshitela, P. J. Robbertse, and P. J. C. Stassen. 2005. Effects of pruning on flowering, yield and fruit quality in mango (Mangifera indica). Austral. J. Experim. Agric. 45, 10 (2005), 1325–1330.
[343]
Michael Yeung, Leonardo Rundo, Yang Nan, Evis Sala, Carola-Bibiane Schönlieb, and Guang Yang. 2021. Calibrating the Dice Loss to Handle Neural Network Overconfidence for Biomedical Image Segmentation. arXiv preprint arXiv:2111.00528 (2021).
[344]
Hui Ying, Zhaojin Huang, Shu Liu, Tianjia Shao, and Kun Zhou. 2021. EmbedMask: Embedding coupling for instance segmentation. In Proceedings of the International Joint Conference on Artificial Intelligence. 1266–1273.
[345]
Mo Yu, Yang Zhang, Shiyu Chang, and Tommi Jaakkola. 2021. Understanding interlocking dynamics of cooperative rationalization. Adv. Neural Inf. Process. Syst. 34 (2021), 12822–12835.
[346]
Yang Yu, Kailiang Zhang, Hui Liu, Li Yang, and Dongxing Zhang. 2020. Real-time visual localization of the picking points for a ridge-planting strawberry harvesting robot. IEEE Access 8 (2020), 116556–116568. DOI:
[347]
Yang Yu, Kailiang Zhang, Li Yang, and Dongxing Zhang. 2019. Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN. Comput. Electron. Agric. 163 (2019), 104846.
[348]
Zhiwen Yu, Hau-San Wong, and Guihua Wen. 2011. A modified support vector machine and its application to image segmentation. Image Vis. Comput. 29, 1 (2011), 29–40.
[349]
Hongbo Yuan, Jiajun Zhu, Qifan Wang, Man Cheng, and Zhenjiang Cai. 2022. An improved DeepLab v3+ deep learning network applied to the segmentation of grape leaf black rot spots. Front. Plant Sci. 13 (2022).
[350]
Kun Yuan, Shaopeng Guo, Ziwei Liu, Aojun Zhou, Fengwei Yu, and Wei Wu. 2021. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21). 579–588.
[351]
Ting Yuan, Lin Lv, Fan Zhang, Jun Fu, Jin Gao, Junxiong Zhang, Wei Li, Chunlong Zhang, and Wenqiang Zhang. 2020. Robust cherry tomatoes detection algorithm in greenhouse scene based on SSD. Agriculture 10, 5 (2020), 160.
[352]
Ilaria Zambon, Massimo Cecchini, Gianluca Egidi, Maria Grazia Saporito, and Andrea Colantoni. 2019. Revolution 4.0: Industry vs. agriculture in a future development for SMEs. Processes 7, 1 (2019), 36.
[353]
Baohua Zhang, Yuanxin Xie, Jun Zhou, Kai Wang, and Zhen Zhang. 2020. State-of-the-art robotic grippers, grasping and control strategies, as well as their applications in agricultural robots: A review. Comput. Electron. Agric. 177 (2020), 105694.
[354]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. arXiv 1611.03530 (2017).
[355]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23, 10 (2016), 1499–1503.
[356]
Li Zhang, Guan Gui, Abdul Mateen Khattak, Minjuan Wang, Wanlin Gao, and Jingdun Jia. 2019. Multi-task cascaded convolutional networks based intelligent fruit detection for designing automated robot. IEEE Access 7 (2019), 56028–56038.
[357]
Li Zhang, Jingdun Jia, Guan Gui, Xia Hao, Wanlin Gao, and Minjuan Wang. 2018. Deep learning based improved classification system for designing tomato harvesting robot. IEEE Access 6 (2018), 67940–67950. DOI:
[358]
Li Zhang, Jingdun Jia, Guan Gui, Xia Hao, Wanlin Gao, and Minjuan Wang. 2018. Deep learning based improved classification system for designing tomato harvesting robot. IEEE Access 6 (2018), 67940–67950.
[359]
Lingxian Zhang, Zanyu Xu, Dan Xu, Juncheng Ma, Yingyi Chen, and Zetian Fu. 2020. Growth monitoring of greenhouse lettuce based on a convolutional neural network. Hortic. Res. 7 (2020).
[360]
Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z. Li. 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).
[361]
Shanwen Zhang, Subing Zhang, Chuanlei Zhang, Xianfeng Wang, and Yun Shi. 2019. Cucumber leaf disease identification with global pooling dilated convolutional neural network. Comput. Electron. Agric. 162 (2019), 422–430.
[362]
Wendong Zhang. 2021. The case for healthy US-China agricultural trade relations despite deglobalization pressures. Appl. Econ. Perspect. Polic. 43, 1 (2021), 225–247.
[363]
Wenwei Zhang, Jiangmiao Pang, Kai Chen, and Chen Change Loy. 2021. K-Net: Towards unified image segmentation. In Proceedings of the NeurIPS Conference.
[364]
Xiao Zhang and Ximing Cai. 2011. Climate change impacts on global agricultural land availability. Environ. Res. Lett. 6, 1 (2011), 014014.
[365]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv Preprint 1707.01083 (2017).
[366]
Y. Zhang, P. Tiňo, A. Leonardis, and K. Tang. 2021. A Survey on Neural Network Interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence 5, 5 (2021), 726–742. DOI:
[367]
Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai, and Haibin Ling. 2018. M2Det: A Single-Shot Object Detector Based on Multi-level Feature Pyramid Network. arXiv Preprint 1811.04533 (2018).
[368]
Guoqing Zheng, Ahmed Hassan Awadallah, and Susan Dumais. 2021. Meta label correction for noisy label learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35.
[369]
Yaoyao Zhong, Weihong Deng, Mei Wang, Jiani Hu, Jianteng Peng, Xunqiang Tao, and Yaohai Huang. 2019. Unequal-training for deep face recognition with long-tailed noisy data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 7812–7821.
[370]
B. Zhou, Q. Cui, X. S. Wei, and Z. M. Chen. 2020. Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9719–9728.
[371]
Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Qibin Hou, and Jiashi Feng. 2021. DeepViT: Towards Deeper Vision Transformer. arXiv preprint arXiv:2103.11886 (2021).
[372]
Xingyi Zhou, Jiacheng Zhuo, and Philipp Krähenbühl. 2019. Bottom-up object detection by grouping extreme and center points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).
[373]
Zhi-Hua Zhou. 2018. A brief introduction to weakly supervised learning. Nat. Sci. Rev. 5, 1 (2018), 44–53.
[374]
Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Hongsheng Li, Xiaohua Wang, and Jifeng Dai. 2022. Uni-perceiver: Pre-training unified architecture for generic perception for zero-shot and few-shot tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16804–16815.

Cited By

View all
  • (2024)Surveillance 5.0: Next-Gen Security Powered by Quantum AI OptimizationRecent Research Reviews Journal10.36548/rrrj.2024.1.0083:1(113-124)Online publication date: Jun-2024
  • (2024)Trends in Application of IoT and AI Technology to Agriculture「本学会のパラダイムと学術用語」(第十五回)農業分野へのIoTやAI技術適用の動向Shokubutsu Kankyo Kogaku10.2525/shita.36.6936:2(69-75)Online publication date: 2024
  • (2024)Computer vision and Generative AI for yield prediction in Digital Agriculture2024 2nd International Conference on Networking and Communications (ICNWC)10.1109/ICNWC60771.2024.10537337(1-6)Online publication date: 2-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 56, Issue 5
May 2024
1019 pages
EISSN:1557-7341
DOI:10.1145/3613598
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 November 2023
Online AM: 03 October 2023
Accepted: 18 September 2023
Revised: 01 September 2023
Received: 25 September 2022
Published in CSUR Volume 56, Issue 5

Check for updates

Author Tags

  1. Agriculture 5.0
  2. controlled-environment agriculture
  3. multimodality
  4. pest and disease detection
  5. growth monitoring
  6. flower and fruit detection

Qualifiers

  • Survey

Funding Sources

  • WeBank-NTU Joint Research Center
  • China-Singapore International Joint Research Institute

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3,979
  • Downloads (Last 6 weeks)493
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Surveillance 5.0: Next-Gen Security Powered by Quantum AI OptimizationRecent Research Reviews Journal10.36548/rrrj.2024.1.0083:1(113-124)Online publication date: Jun-2024
  • (2024)Trends in Application of IoT and AI Technology to Agriculture「本学会のパラダイムと学術用語」(第十五回)農業分野へのIoTやAI技術適用の動向Shokubutsu Kankyo Kogaku10.2525/shita.36.6936:2(69-75)Online publication date: 2024
  • (2024)Computer vision and Generative AI for yield prediction in Digital Agriculture2024 2nd International Conference on Networking and Communications (ICNWC)10.1109/ICNWC60771.2024.10537337(1-6)Online publication date: 2-Apr-2024
  • (2024)Exploration of Computer Vision Systems in the Recognition of Characteristics in Parts in an Industrial EnvironmentProgress in Artificial Intelligence10.1007/978-3-031-73497-7_28(347-359)Online publication date: 16-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media