Shelf Auditing Based on Image Classification Using Semi-Supervised Deep Learning to Increase On-Shelf Availability in Grocery Stores
<p>History of one-stage detectors.</p> "> Figure 2
<p>The general structure of the proposed SOSA approach.</p> "> Figure 3
<p>The main screen of SOSA XAI.</p> "> Figure 4
<p>Train screen of SOSA XAI.</p> "> Figure 5
<p>Test screen of SOSA XAI.</p> "> Figure 6
<p>Metrics screen of SOSA XAI.</p> "> Figure 7
<p>(<b>a</b>) Distribution of labeled products; (<b>b</b>) Distribution of labeled shelves’ areas.</p> "> Figure 8
<p>Structure of Pascal visual object classes (VOC) format for a sample file.</p> "> Figure 9
<p>RetinaNet sample file structure and conversion details.</p> "> Figure 10
<p>YOLOv3 and YOLOv4 sample file structure and conversion details.</p> "> Figure 11
<p>Comparison of loss value changes during the training phase.</p> "> Figure 12
<p>Success rates when different ratios of labeled data are considered.</p> "> Figure 13
<p>Sample object detection results of the breakfast section. It shows detection results using the model that was trained with (<b>a</b>) 20% of labeled images; (<b>b</b>) 40% of labeled images; (<b>c</b>) 60% of labeled images; and (<b>d</b>) 80% of labeled images.</p> "> Figure 13 Cont.
<p>Sample object detection results of the breakfast section. It shows detection results using the model that was trained with (<b>a</b>) 20% of labeled images; (<b>b</b>) 40% of labeled images; (<b>c</b>) 60% of labeled images; and (<b>d</b>) 80% of labeled images.</p> "> Figure 14
<p>Sample object detection results of the beverage section. It shows detection results using the model that was trained with (<b>a</b>) 20% of labeled images; (<b>b</b>) 40% of labeled images; (<b>c</b>) 60% of labeled images; and (<b>d</b>) 80% of labeled images.</p> "> Figure 14 Cont.
<p>Sample object detection results of the beverage section. It shows detection results using the model that was trained with (<b>a</b>) 20% of labeled images; (<b>b</b>) 40% of labeled images; (<b>c</b>) 60% of labeled images; and (<b>d</b>) 80% of labeled images.</p> "> Figure 15
<p>Sample object detection results of the rice and pasta section. It shows detection results using the model that was trained with (<b>a</b>) 20% of labeled images; (<b>b</b>) 40% of labeled images; (<b>c</b>) 60% of labeled images; and (<b>d</b>) 80% of labeled images.</p> "> Figure 15 Cont.
<p>Sample object detection results of the rice and pasta section. It shows detection results using the model that was trained with (<b>a</b>) 20% of labeled images; (<b>b</b>) 40% of labeled images; (<b>c</b>) 60% of labeled images; and (<b>d</b>) 80% of labeled images.</p> ">
Abstract
:1. Introduction
2. Related Works
- First, some of the previous studies [5,9,10,11,12,16] used traditional techniques such as image processing (IP) to monitor OSA, whereas we used deep learning techniques. In IP-based approaches, a huge amount of reference images has to be stored to match the target image and for every product updating, reference images have to be updated manually. In this context, an important advantage of our method is that it does not require any reference image and therefore it does not need manual updating when products are updated. Moreover, it automatically extracts features from an input image thanks to deep learning.
- Second, the previous deep learning-based studies used different network structures such as the CaffeNet-based network [6,13] and CIFAR-10-based network [6,13], whereas we designed a novel network architecture that consists of RetinaNet, YOLOv3, and YOLOv4 detectors. Here, the advantage of our approach is that it builds three different models and selects the best one, and hence, satisfactory results can be achieved by the selection of the best model.
- Third, the previous deep learning-based studies used two-stage detectors. On the other hand, in this study, we benefit from one-stage detectors because of their speed and achieving satisfactory accuracy results for OSA monitoring.
- Four, our study differs from the rest in that we adapted the semi-supervised learning concept, and therefore we benefited from both labeled and unlabeled data. Here, the main advantage is that our method reduces the need for labeling images which is an expensive, tedious, difficult, and time-consuming process since it requires human labor. Satisfactory results can be achieved using a small number of labeled images. Moreover, the proposed method will expand the application field of machine learning in grocery stores since a large amount of OSA data generated in real-life are unlabeled.
- Finally, differently from the previous studies, we introduced an explainable AI concept into OSA. The developed new SOSA XAI software application allows users to manage, understand, and trust the model when monitoring OSA.
3. Materials and Methods
3.1. Deep Learning for Object Detection
3.2. Proposed Approach: Semi-Supervised Learning on OSA (SOSA)
3.2.1. The General Structure of the Proposed SOSA Method
3.2.2. The Formal Definition of the Proposed SOSA Method
Algorithm 1 SOSA: Semi-Supervised Learning on OSA |
Inputs: D: Labeled dataset D = {(x1, y1), (x2, y2), …, (xn, yn)} with n instances U: Unlabeled dataset U = {xn+1, xn+2, …, xn+s} with s instances Z: One-stage detectors Z = {z1, z2, …, zk} with k detectors Y: Class labels Y = {c1, c2, …, cm} with m classes SI: Sample image Outputs: TM: Trained model ŷ: Predicted class labels for the products included in the sample image Begin: DTrain = Split(D, n * percentage) DTest = Split(D, (n − (n * percentage))) //Step 1—Training with labeled data for c = 1 to k do foreach epoch foreach (xi, yi) in DTrain zc = Train(xi, yi) end foreach end foreach Z = Z ∪ zc end for //Step 2—Testing one-stage detectors and selecting the best one for c = 1 to k do foreach (xi, yi) in DTest Prediction = zc(xi) PredictionResultc = PredictionResultc ∪ Prediction end foreach end for SD = MAX (PredictionResultc) //SD: Selected detector //Step 3—Labeling unlabeled image data and generating pseudo-labels foreach xi in U ŷ = SD(xi) D.Add(xi, ŷ) end foreach //Step 4—Re-training the model with pseudo-labeled data TM = Train(D) //Step 5—Classifying a sample image ŷ = TM(SI) End Algorithm |
3.2.3. The Advantages of the Proposed SOSA Method
- The traditional OSA applications are limited to using only labeled image data to build a model. However, labeling shelf images are a time-consuming, tedious, expensive, and difficult job because of existing so many products on a one-shelf image, and for this reason, so many human laborers are needed. This is especially true for the OSA applications that include learning from a large number of class labels and distinguishing similar classes. The main advantage of the SOSA method is that it solves OSA problems using a small number of labeled shelf images. The existing labeled dataset is extended by using unlabeled data with automatically assigned labels, and hence, high accuracy results are taken with the SOSA approach in an efficient way.
- Another advantage is that it includes three different one-stage detector models and the model with the highest accuracy is selected at the beginning of the semi-supervised learning. Hence, satisfactory results can be achieved by the selection of the best model.
- The SOSA approach uses three different deep learning techniques (RetinaNet, YOLOv3, and YOLOv4) without any modification or development of the methods. Therefore, SOSA has advantages in terms of easy implementation. It is possible to implement it in Python by using open-source codes available in the related machine learning libraries.
- The main idea behind the SOSA method is to take advantage of a huge amount of unlabeled image data when building a classifier. In addition to labeled data, the SOSA method also exploits unlabeled data to improve classification performance. Thanks to the SOSA method, the unlabeled data instances provide additional knowledge, and they can be successfully used to improve the generalization ability of the learning system.
- Another advantage is that the SOSA method can be applied to any OSA image data without any prior information about the given dataset. It does not make any specific assumptions for the given data.
- Since a large amount of OSA data generated in real-life is unlabeled, the SOSA method will expand the application field of machine learning in grocery stores.
3.3. Explainable AI for SOSA
4. Experimental Studies and Results
5. Discussion
- (1)
- It was observed that “semi-supervised learning” provides many advantages for monitoring OSA, including improving efficiency, reducing labeling cost, providing additional knowledge present in unlabeled data, and increasing the applicability of machine learning in the retail sector.
- (2)
- The combination of three deep learning techniques (RetinaNet, YOLOv3, YOLOv4) improves accuracy when monitoring OSA.
- (3)
- Explainable AI is a powerful tool in monitoring OSA since it provides users with an explanation of individual decisions and enables users to manage, understand, and trust the OSA model.
- (4)
- The proposed SOSA method has the potential to expand the application of machine learning in grocery stores, thanks to its advantages.
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Corsten, D.; Gruen, T. Desperately seeking shelf availability: An examination of the extent, the causes, and the efforts to address retail out-of-stocks. Int. J. Retail Distrib. Manag. 2003, 31, 605–617. [Google Scholar] [CrossRef]
- Musalem, A.; Olivares, M.; Bradlow, E.T.; Terwiesch, C.; Corsten, D. Structural estimation of the effect of out-of-stocks. Manage. Sci. 2010, 56, 1180–1197. [Google Scholar] [CrossRef] [Green Version]
- Bottani, E.; Bertolini, M.; Rizzi, A.; Romagnoli, G. Monitoring on-shelf availability, out-of-stock and product freshness through RFID in the fresh food supply chain. Int. J. RF Technol. Res. Appl. 2017, 8, 33–55. [Google Scholar] [CrossRef]
- Michael, K.; McCathie, L. The Pros and Cons of RFID in Supply Chain Management. In Proceedings of the 4th Annual 4th International Conference on Mobile Business, ICMB, Sydney, NSW, Australia, 11–13 July 2005; pp. 623–629. [Google Scholar] [CrossRef] [Green Version]
- Moorthy, R.; Behera, S.; Verma, S.; Bhargave, S.; Ramanathan, P. Applying image processing for detecting on-shelf availability and product positioning in retail stores. In Proceedings of the ACM International Conference Proceeding Ser., Kochi, India, 10–13 August 2015; pp. 451–457. [Google Scholar] [CrossRef]
- Higa, K.; Iwamoto, K. Robust estimation of product amount on store shelves from a surveillance camera for improving on-shelf availability. In Proceedings of the IST 2018—IEEE International Conference Imaging Systems and Techniques Proceeding, Kraków, Poland, 16–18 October 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Zhu, X. Semi-Supervised Learning, Encyclopedia of Machine Learning and Data Mining; Springer: Berlin/Heidelberg, Germany, 2017; Volume 3, ISBN 9781489976871. [Google Scholar] [CrossRef]
- Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef] [Green Version]
- Moorthy, R.; Behera, S.; Verma, S. On-Shelf Availability in Retailing. Int. J. Comput. Appl. 2015, 115, 47–51. [Google Scholar] [CrossRef]
- Satapathy, R.; Prahlad, S.; Kaulgud, V. Smart Shelfie-Internet of shelves: For higher on-shelf availability. In Proceedings of the 2015 IEEE Region 10 Symposium TENSYMP, Ahmedabad, India, 13–15 May 2015; pp. 70–73. [Google Scholar] [CrossRef]
- Rosado, L.; Goncalves, J.; Costa, J.; Ribeiro, D.; Soares, F. Supervised Learning for Out-of-Stock Detection in Panoramas of Retail Shelves. In Proceedings of the IST 2016 IEEE International Conference on Imaging Systems and Techniques, Chania, Greece, 4–6 October 2016; pp. 406–411. [Google Scholar] [CrossRef]
- Kejriwal, N.; Garg, S.; Kumar, S. Product counting using images with application to robot-based retail stock assessment. In Proceedings of the IEEE Conference on Technologies for Practical Robot Applications, Woburn, MA, USA, 11–12 May 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Higa, K.; Iwamoto, K. Robust shelf monitoring using supervised learning for improving on-shelf availability in retail stores. Sensors 2019, 19, 2722. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, S.; Tian, H. Planogram Compliance Checking Using Recurring Patterns. In Proceedings of the 2015 IEEE International Symposium on Multimedia (ISM), Miami, FL, USA, 14–16 December 2015; pp. 27–32. [Google Scholar] [CrossRef]
- Liu, S.; Li, W.; Davis, S.; Ritz, C.; Tian, H. Planogram compliance checking based on detection of recurring patterns. IEEE Multimed. 2016, 23, 54–63. [Google Scholar] [CrossRef] [Green Version]
- Saran, A.; Hassan, E.; Maurya, A.K. Robust visual analysis for planogram compliance problem. In Proceedings of the 14th IAPR International Conference on Machine Vision Applications MVA, Tokyo, Japan, 18–22 May 2015; pp. 576–579. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 30th IEEE Conference on Computer Vision Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 318–327. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; 2012; pp. 1097–1105. [Google Scholar]
- Emmert-Streib, F.; Yli-Harja, O.; Dehmer, M. Explainable artificial intelligence and machine learning: A reality rooted perspective. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, 1–8. [Google Scholar] [CrossRef]
- Confalonieri, R.; Coba, L.; Wagner, B.; Besold, T.R. A historical perspective of explainable Artificial Intelligence. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 1–21. [Google Scholar] [CrossRef]
- Kajabad, E.N.; Ivanov, S.V. People detection and finding attractive areas by the use of movement detection analysis and deep learning approach. Procedia Comput. Sci. 2019, 156, 327–337. [Google Scholar] [CrossRef]
- WebMarket Is An Image Database Built for Computer Vision Research. Available online: http://yuhang.rsise.anu.edu.au (accessed on 11 November 2020).
- Santra, B.; Mukherjee, D.P. A comprehensive survey on computer vision based approaches for automatic identification of products in retail store. Image Vision Comput. 2019, 86, 45–63. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, L.; Hartley, R.; Li, H. Where’s the Weet-Bix? In Proceedings of the ACCV’07: 8th Asian Conference on Computer Vision, Tokyo, Japan, 18–22 November 2007; Part 1. pp. 800–810. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, L.; Hartley, R.; Li, H. Handling significant scale difference for object retrieval in a supermarket. DICTA 2009 Digit. Image Comput. Tech. Appl. 2009, 468–475. [Google Scholar] [CrossRef]
- LabelImg. Available online: https://github.com/tzutalin/labelImg (accessed on 11 November 2020).
- Keras RetinaNet. Available online: https://github.com/fizyr/keras-retinanet (accessed on 11 November 2020).
- YOLOv3 Framework. Available online: https://pjreddie.com/darknet/yolo (accessed on 11 November 2020).
- YOLOv4 Framework. Available online: https://github.com/AlexeyAB/darknet (accessed on 11 November 2020).
Reference No | Year | Object Detection | Learning | XAI | Methods | ||||
---|---|---|---|---|---|---|---|---|---|
Traditional Detection | Deep Learning | Supervised Learning | Semi-Supervised Learning | ||||||
Two-Stage | One-Stage | ||||||||
[5] | 2015 | ✔ | ✔ | ✕ | SURF | ||||
[9] | 2015 | ✔ | ✔ | ✕ | Image Processing | ||||
[10] | 2015 | ✔ | ✔ | ✕ | Image Processing | ||||
[12] | 2015 | ✔ | ✔ | ✕ | k-d Tree, RANSAC | ||||
[16] | 2015 | ✔ | ✔ | ✕ | Hausdorff Map, Euclidean Distance, Binary Distance Map | ||||
[11] | 2016 | ✔ | ✔ | ✕ | AKAZE Feature Detector, Cascade Classifier | ||||
[6] | 2018 | ✔ | ✔ | ✕ | CaffeNet-based Network, CIFAR-10-based Network | ||||
[13] | 2019 | ✔ | ✔ | ✕ | CaffeNet-based Network, CIFAR-10-based Network, Hungarian | ||||
Proposed Method | ✔ | ✔ | ✔ | RetinaNet, YOLOv3, YOLOv4 |
Classes | RetinaNet (Backbone: ResNet50) | RetinaNet (Backbone: ResNet101) | YOLOv3 (Backbone: Darknet53) | YOLOv4 (Backbone: CSPDarkNet53) |
---|---|---|---|---|
AP | ||||
Beverage Product | 0.9469 | 0.9625 | 0.8636 | 0.9808 |
Breakfast Product | 0.6975 | 0.7245 | 0.8003 | 0.9616 |
Food Product | 0.8886 | 0.8697 | 0.6321 | 0.9252 |
Empty Shelf | 0.8481 | 0.8387 | 0.8189 | 0.9136 |
Almost Empty Shelf | 0.2386 | 0.1561 | 0.5884 | 0.8125 |
mAP | 0.7239 | 0.7103 | 0.7406 | 0.9187 |
F1-score | 0.7333 | 0.7430 | 0.6600 | 0.9100 |
Recall | 0.8105 | 0.8228 | 0.6600 | 0.9600 |
Dataset ID | Percentage of Labeled Images | Number of Labeled Images | Number of Unlabeled Images | Number of Total Images |
---|---|---|---|---|
D1 | 20% | 300 | 1200 | 1500 |
D2 | 40% | 300 | 450 | 750 |
D3 | 60% | 300 | 200 | 500 |
D4 | 80% | 300 | 75 | 375 |
Classes | RetinaNet (Backbone: ResNet50) | RetinaNet (Backbone: ResNet101) | YOLOv3 (Backbone: Darknet53) | SOSA (80% Labeled Images) | SOSA (60% Labeled Images) | SOSA (40% Labeled Images) | SOSA (20% Labeled Images) |
---|---|---|---|---|---|---|---|
AP | |||||||
Beverage Product | 0.9469 | 0.9625 | 0.8636 | 0.9586 | 0.8797 | 0.7477 | 0.8224 |
Breakfast Product | 0.6975 | 0.7245 | 0.8003 | 0.9467 | 0.9626 | 0.9253 | 0.8414 |
Food Product | 0.8886 | 0.8697 | 0.6321 | 0.9303 | 0.8680 | 0.7308 | 0.8718 |
Empty Shelf | 0.8481 | 0.8387 | 0.8189 | 0.8410 | 0.8736 | 0.7601 | 0.7216 |
Almost Empty Shelf | 0.2386 | 0.1561 | 0.5884 | 0.7866 | 0.7269 | 0.4646 | 0.3471 |
mAP | 0.7239 | 0.7103 | 0.7406 | 0.8927 | 0.8622 | 0.7257 | 0.7209 |
F1-score | 0.7333 | 0.7430 | 0.6600 | 0.9000 | 0.8700 | 0.7800 | 0.8100 |
Recall | 0.8105 | 0.8228 | 0.6600 | 0.9300 | 0.9000 | 0.7800 | 0.8300 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yilmazer, R.; Birant, D. Shelf Auditing Based on Image Classification Using Semi-Supervised Deep Learning to Increase On-Shelf Availability in Grocery Stores. Sensors 2021, 21, 327. https://doi.org/10.3390/s21020327
Yilmazer R, Birant D. Shelf Auditing Based on Image Classification Using Semi-Supervised Deep Learning to Increase On-Shelf Availability in Grocery Stores. Sensors. 2021; 21(2):327. https://doi.org/10.3390/s21020327
Chicago/Turabian StyleYilmazer, Ramiz, and Derya Birant. 2021. "Shelf Auditing Based on Image Classification Using Semi-Supervised Deep Learning to Increase On-Shelf Availability in Grocery Stores" Sensors 21, no. 2: 327. https://doi.org/10.3390/s21020327