Abstract
Image-based food pattern classification poses challenges of non-fixed spatial distribution and ingredient occlusion for mainstream computer vision algorithms. However, most current approaches classify food and ingredients by directly extracting abstract features of the entire image through a convolutional neural network (CNN), ignoring the relationship between food and ingredients and ingredient occlusion problem. To address these issues mentioned, we propose a FoodNet for both food and ingredient recognition, which uses a multi-task structure with a multi-scale relationship learning module (MSRL) and a label dependency learning module (LDL). As ingredients normally co-occur in an image, we present the LDL to use the dependency of ingredient to alleviate the occlusion problem of ingredient. MSRL aggregates multi-scale information of food and ingredients, then uses two relational matrixs to model the food-ingredient matching relationship to obtain richer feature representation. The experimental results show that FoodNet can achieve good performance on the Vireo Food-172 and UEC Food-100 datasets. It is worth noting that it reaches the most state-of-the-art level in terms of ingredient recognition in the Vireo Food-172 and UECFood-100.The source code will be made available at https://github.com/visipaper/FoodNet.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets VireoFood-172 and UEC Food-100 used to train and evaluate the neural networks are publicly available at http://vireo.cs.cityu.edu.hk/VireoFood172/ and http://foodcam.mobi/dataset.html.
References
Guillaumin M, Gool LV, et al. (2014) Food-101 - mining discriminative components with random forests. In: Proceeding of the 13 th European Conference Computer Vison, Springer, Cham, Switzerland, pp 446–461
He H, Kong F, Tan J (2016) DietCam: multi-view food recognition using a multikernel SVM. In: Proceedings of the IEEE journal of biomedical and health informatics, May, pp 848–855
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770-778
Szegedy C, et al. (2014) Going deeper with convolutions. IEEE Comput Soc, pp 1-9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (CVPR) , pp 770–778
Zhang N, Donahue J, et al. (2014) Part-based r-cnns for fine-grained category detection. In: Proceedings of the ECCV. Springer , pp 834–849
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5152–5161
Won CS (2020) Multi-scale CNN for fine-grained image recognition. In: IEEE Access , pp 116663–116674
Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15074–15083
Lin T, RoyChowdhury A, Maji S (2018) Bilinear convolutional neural networks for fine-grained visual recognition. In: IEEE transactions on pattern analysis and machine intelligence, pp 1309–1322
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4476–4484
Yan T, Li H, Sun B, Wang Z, Luo Z (2022) Discriminative feature mining and enhancement network for low-resolution fine-grained image recognition. In: IEEE transactions on circuits and systems for video technology
Liu C, Liang Y, Xue Y, Qian X, Fu J (2021) Food and ingredient joint learning for fine-grained recognition. IEEE Trans Circuits Syst Video Technol 31(6):2480–2493
Zhang C, Huang Z, Liu S, Xiao J (2022) Dual-channel multi-task CNN for no-reference screen content image quality assessment. In: IEEE transactions on circuits and systems for video technology
Chen J, Ngo CW, (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Acm on multimedia conference ACM
Matsuda Y, Yanai K (2012) Multiple-food recognition considering co-occurrence employing manifold ranking. In: Proceedings of the international conference on pattern recognition, pp 2017–2020
Bhadane P, Ravikesh Bhaladhare P (2021) Optimized deep neuro fuzzy network based automatic approach for segmentation and food recognition. In: 2021 in 5th international conference on information systems and computer networks (ISCON), pp 1–4. https://doi.org/10.1109/ISCON52037.2021.9702370
Anthimopoulos MM, Gianola L, Scarnato L, Diem P, Mougiakakou SG (2014) A food recognition system for diabetic patients based on an optimized bag-of-features model. IEEE J Biomed Health Inform 18(4):1261–1271. https://doi.org/10.1109/JBHI.2014.2308928
Lo FPW, Sun Y, Qiu J, Lo B (2020) Image-based food classification and volume estimation for dietary assessment: a review. IEEE J Biomed Health Inform 24(7):1926–1939. https://doi.org/10.1109/JBHI.2020.2987943
Ciocca G, Napoletano P, Schettini R (2017) Food recognition: a new dataset, experiments, and results. IEEE J Biomed Health Inform 21(3):588–598. https://doi.org/10.1109/JBHI.2016.2636441
Lo FPW, Sun Y, Qiu J, Lo B (2020) Image-based food classification and volume estimation for dietary assessment: a review. IEEE J Biomed Health Inform 24(7):1926–1939. https://doi.org/10.1109/JBHI.2020.2987943
Lu Y, Allegra D, Anthimopoulos M, Stanco F, Farinella GM, Mougiakakou S (2018) A multi-task learning approach for meal assessment. In: Proceedings of the joint workshop on multimedia for cooking and eating activities and multimedia assisted dietary management, pp 46–52
Jiang S, Min W, Liu L, Luo Z (2020) Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans Image Process 29:265–276
Jiang S, et al. (2020) Few-shot food recognition via multi-view representation learning. In: ACM transactions on multimedia computing communications and applications
Zhao H, Yap K, Kot AC, Duan L, Cheung N (2019) Few-shot and many-shot fusion learning in mobile visual food recognition. IEEE Int Symp Circ Syst (ISCAS) 2019:1–5
Zhang L, Zhao J, Li S, Shi B, Duan L-Y (2019) From market to dish: multi-ingredient image recognition for personalized recipe recommendation. IEEE Int Conf Multimedia Expo (ICME) 2019:1252–1257
Aguilar E, Bolanos M, Radeva P (2019) Regularized uncertainty-based multi-task learning model for food analysis. J Vis Commun Image Represent 60:360–370
Ege T, Yanai K (2018) Multi-task learning of dish detection and calorie estimation. In: Proceedings of the joint workshop on multimedia for cooking and eating activities and multimedia assisted dietary management, pp 53–58
Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. Mach Learn Knowl Discov Databases pp 254–269
Nam J, Mencía EL, Kim HJ, Fürnkranz J (2017) Maximizing subset accuracy with recurrentneural networks in multi-label classification. In: Advances in neural information processing systems, pp 5419–5429
Wang J, Y ang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: A unified framework for multi-label image classification. In: IEEE conference on computer vision and pattern recognition, pp 2285–2294
Zhang J, Wu Q, Shen C, Zhang J, Lu J (2018) Multilabel image classification with regional latent semantic dependencies. IEEE Trans Multimedia 20(10):2801–2813
Host-Parasite: Graph LSTM-In-LSTM for group activity recognition. In: IEEE transactions on neural networks and learning systems (TNNLS), 32(2): 663–674 (2021)
Chen T, Xu M, Hui X, Wu H, Lin L (2019) Learning semantic-specific graph representation for multi-label image recognition. In: IEEE/CVF international conference on computer vision (ICCV), pp 522–531
You R, et al. (2020) Cross-modality attention with semantic graph embedding for multi-label classification. In: The AAAI conference on artificial intelligence
Yu F, Vladlen K (2015) Multi-scale context aggregation by dilated convolutions. In: CVPR, arXiv preprint arXiv:1511.07122
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3684–3692
Kamnitsas K et al (2017) Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal 36(2017):61–78
Cui Z, Wenlin C, Yixin C (2016) Multi-scale convolutional neural networks for time series classification. In: Computer vision and pattern recognition, arXiv preprint arXiv:1603.06995
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6054–6063
Pennington J, Socher R, Manning C (2014) GloV e: global vectors for word representation. In: Proceedings of empirical methods in natural language processing, pp 1532–1543
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations, pp 1–12
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision (ICCV), pp 618–626
Zhu K, Wu J (2021) Residual attention: a simple but effective method for multi-label recognition. In: IEEE/CVF international conference on computer vision (ICCV), pp 184–193
Chen J, Zhu B, Ngo C-W, Chua T-S, Jiang Y-G (2021) A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Trans Image Process 30:1514–1526
Won CS (2020) Multi-scale CNN for fine-grained image recognition. IEEE Access 8:116663–116674
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Liu Z, et al. (2022) Swin transformer V2: scaling up capacity and resolution. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA, pp 11999–12009. https://doi.org/10.1109/CVPR52688.2022.01170
Qi Charles R, et al (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhao H, et al (2021) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shuang, F., Lu, Z., Li, Y. et al. Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition. Neural Comput & Applic 36, 4485–4501 (2024). https://doi.org/10.1007/s00521-023-09349-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09349-4