This document discusses an inverse cooking system that can generate cooking recipes from food images using a convolutional neural network. The system is able to predict ingredients from an image without imposing an order on the ingredients. It then generates cooking instructions by considering both the image and inferred ingredients. The system's performance on predicting ingredients was evaluated on the Recipe1M dataset and showed improved results over previous methods. When generating full recipes, the system was able to produce high-quality recipes by leveraging both the image and ingredients in a way that users found more compelling than retrieval-based approaches.
This document discusses an inverse cooking system that can generate cooking recipes from food images using a convolutional neural network. The system is able to predict ingredients from an image without imposing an order on the ingredients. It then generates cooking instructions by considering both the image and inferred ingredients. The system's performance on predicting ingredients was evaluated on the Recipe1M dataset and showed improved results over previous methods. When generating full recipes, the system was able to produce high-quality recipes by leveraging both the image and ingredients in a way that users found more compelling than retrieval-based approaches.
This document discusses an inverse cooking system that can generate cooking recipes from food images using a convolutional neural network. The system is able to predict ingredients from an image without imposing an order on the ingredients. It then generates cooking instructions by considering both the image and inferred ingredients. The system's performance on predicting ingredients was evaluated on the Recipe1M dataset and showed improved results over previous methods. When generating full recipes, the system was able to produce high-quality recipes by leveraging both the image and ingredients in a way that users found more compelling than retrieval-based approaches.
This document discusses an inverse cooking system that can generate cooking recipes from food images using a convolutional neural network. The system is able to predict ingredients from an image without imposing an order on the ingredients. It then generates cooking instructions by considering both the image and inferred ingredients. The system's performance on predicting ingredients was evaluated on the Recipe1M dataset and showed improved results over previous methods. When generating full recipes, the system was able to produce high-quality recipes by leveraging both the image and ingredients in a way that users found more compelling than retrieval-based approaches.
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com
INVERSE COOKING RECIPE GENERATION FROM FOOD IMAGES
Mrs. B. Ujwala*1, Sevella Amarnath*2, Rohan Swain*3, Kasa Vignesh*4 *1Assistant Professor, Department Of CSE, Anurag Group Of Institutions, Venkatapur, Hyderabad, India. *2,3,4B.Tech, Department Of CSE, Anurag Group Of Institutions, Venkatapur, Hyderabad, India. DOI : https://www.doi.org/10.56726/IRJMETS35206 ABSTRACT Food photography is appreciated by many as it showcases the beauty of food. However, food images do not provide any information about the preparation process and the complexity of the recipe behind each dish. An inverse cooking system that generates cooking recipes from food images is developed using Convolutional Neural Network (CNN). The system utilizes a unique architecture to predict ingredients and their dependencies without imposing any order. It then generates cooking instructions by simultaneously considering the image and inferred ingredients. The system's performance was extensively evaluated on the Recipe1M dataset, demonstrating an improvement in ingredient prediction compared to previous methods. The system was also able to generate high-quality recipes by leveraging both the image and inferred ingredients, and according to human evaluation, produced more compelling recipes than retrieval-based approaches. Keywords: Inverse Cooking, Recipe 1M Dataset, CNN, Retrieval-Based Approaches. I. INTRODUCTION Food plays a crucial role in human life, not only providing us with energy but also influencing our identity and culture. Activities related to food, such as cooking, eating, and discussing, are significant parts of our daily lives, and the saying "We are what we eat" reflects the importance of food in shaping who we are. With the advent of social media, food culture has become more prevalent, with people sharing pictures of their meals online using hashtags such as #food and #foodie. This trend underscores the value that food holds in our society. Additionally, the way we consume and prepare food has evolved over time. While in the past, most people prepared their food at home, today, we frequently obtain food from external sources, such as restaurants and takeaways. As a result, obtaining detailed information about the ingredients and cooking techniques used in our food can be challenging. Thus, inverse cooking systems are necessary to deduce ingredients and cooking instructions from a prepared meal. In recent years, significant progress has been made in visual recognition tasks such as natural image classification, object detection, and semantic segmentation. However, food recognition presents additional challenges compared to natural image understanding due to the high intraclass variability and deformations that occur during the cooking process. Cooked dishes often contain ingredients, which come in various colors, forms, and textures. Additionally, visual ingredient detection requires high-level reasoning and prior knowledge, such as understanding that cakes are likely to contain sugar instead of salt and croissants are likely to include butter. Therefore, recognizing food requires computer vision systems to incorporate prior knowledge and go beyond what is merely visible to provide high-quality structured food preparation descriptions. II. LITERATURE REVIEW There are many works that have been carried in the past on recipe generation. Here is a survey of some works which help in understanding the previous techniques and gives a clear view on the challenges on which the researchers have worked and the new things that they have introduced. Lukas Bossard et al. [1] introduced a new dataset for food recognition called Food-101 in 2014. The dataset contains over 100,000 images of 101 food categories, and they proposed a method for mining the discriminative components of the images using random forests. Their method outperforms several state-of-the- art algorithms for food recognition on the Food-101 dataset. They also conducted a detailed analysis of the
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[3802] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com dataset, including the distribution of the categories and the difficulty of the recognition task . They have highlighted the importance of large-scale and diverse datasets in food image analysis. Micael Carvalho et al. [2] proposed a cross-modal retrieval approach for the cooking context, which involved learning semantic text-image embeddings to link cooking recipes and their corresponding food images. They used a multi-modal deep neural network to learn the embeddings, which consisted of a textual embedding and a visual embedding. The textual embedding is learned from the recipe ingredients and instructions, while the visual embedding is learned from the food images. They evaluated their approach on a dataset of 5,000 recipe- image pairs and also conducted a qualitative analysis of the retrieved results and showed that their method is able to retrieve relevant recipes and images. They have provided a novel approach for linking cooking recipes and food images, highlighting the importance of cross-modal retrieval in the food image analysis field. Chong-Wah Ngo et al.[3] introduced a deep-based approach for ingredient recognition in cooking recipes, which is essential for cooking recipe retrieval. They have used a convolutional neural network (CNN) to extract features from food images and then used these features to recognize the ingredients in the corresponding recipes. They have evaluated their approach on a dataset of 600 recipes and also conducted a user study to evaluate the effectiveness of their method for recipe retrieval and showed that their approach improves the retrieval performance compared to a baseline method and highlighted the importance of ingredient recognition in cooking recipe retrieval. Jing-Jing Chen et al. [4] proposed a cross-modal recipe retrieval approach that considers rich food attributes such as taste, cuisine, and occasion, in addition to the ingredients and food images. They have used a deep neural network to extract features from food images and text features from the recipe ingredients and attributes. They then used a multimodal fusion method to combine the features and perform cross-modal retrieval and evaluated their approach on a dataset of 1,000 recipes. They also conducted a user study to evaluate the effectiveness of their method and show that their approach improves the retrieval performance compared to a baseline method. They have highlighted the importance of considering various aspects of food when performing food image analysis tasks. Mei-Yun Chen et al. [5] proposed a system consisting of two parts: a food identification module and a quantity estimation module. The food identification module uses a combination of visual features and text features to identify the Chinese dishes in the input image. The quantity estimation module estimates the quantity of each dish by analyzing the visual characteristics of the food and comparing it with a reference database. They have evaluated the performance of the system on a dataset of 50 Chinese dishes and reported promising results. They also discussed the limitations of the system, such as the need for a more extensive reference database and the difficulty in accurately estimating the quantity of mixed dishes. Xin Chen et al. [6] introduced a dataset called Chinese Foodnet, containing over 0.5 million images of 106 categories of Chinese food. They described the process of creating the dataset, including the data collection and cleaning procedures. They also evaluated the performance of several state-of-the-art deep learning models on the dataset and compared the results with other existing datasets for food recognition and highlighted the importance of large-scale and diverse datasets in food image analysis. Bo Dai et al.[7] proposed a novel approach for generating diverse and natural language descriptions of images using a conditional generative adversarial network (cGAN). They trained the cGAN on a dataset of images and corresponding descriptions and used it to generate multiple diverse and natural descriptions for each image and evaluated their approach on several benchmark datasets. They also demonstrated the potential of their method for generating diverse and natural descriptions for food images. They have highlighted the potential of using generative models for generating diverse and natural descriptions of images, including food images, which can be useful for various food image analysis tasks such as recipe generation and recommendation. Eyke H¨ullermeier et al.[8] proposed a probabilistic classifier chain approach for multilabel classification tasks, which aims to predict multiple labels for each instance. They proposed a Bayesian framework for learning the optimal ordering of the classifiers and the optimal threshold values for each label. They also introduced a novel objective function for evaluating the quality of the probabilistic classifier chains, which is based on the expected loss for the multilabel classification task. They also demonstrated the potential of their method for food image analysis tasks, such as food classification and ingredient recognition. www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [3803] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com S. Sankar et al.[11] (2021) provided an overview of recent advances in recipe generation using deep learning methods. They covered various techniques used for recipe generation, including those based on food images and discussed the challenges faced in the field, such as the need for large-scale datasets and the variability in food appearance. They have also highlighted future research directions in the field. K. Zhang et al.[12] summarized recent advances in food image analysis, including methods for recipe generation from food images. They provided an overview of the challenges faced in this field, such as the variability in food appearance, the need for large-scale datasets, and the lack of standard evaluation metrics. They have covered various techniques used for food image analysis, including deep learning methods and feature-based methods and also discussed the applications of food image analysis, such as dietary assessment and food recognition. R. Varshney et al.[13] provided an overview of recent research in recipe generation from food images. They discussed various approaches and models used in this field, such as deep learning-based methods, attention- based methods, and transfer learning-based methods. They covered the evaluation metrics and datasets used for recipe generation from food images and highlighted the challenges faced in this field, such as the need for large-scale and diverse datasets and the difficulty in capturing the complexity of cooking procedures. M. Raza et al.[14] provided an overview of various deep learning techniques, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs). They discussed the challenges faced in this field, such as the need for large-scale and diverse datasets, the difficulty in capturing the variability in food appearance, and the lack of interpretability of deep learning models. They also covered various applications of deep learning in food image analysis, including food recognition, food portion estimation, and recipe generation. L. Gao et al.[15] discussed the challenges associated with analyzing food images and generating recipes from them and also discussed the different approaches and models used in this field, as well as the evaluation metrics and datasets. They provided a detailed description of various deep learning architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs), that have been employed for food image analysis and recipe generation. M. Han et al.[16] proposed a deep learning approach to generate cooking recipes from food images. They developed a system that can automatically generate a recipe by analyzing a food image, without any human intervention. The system consists of two main components: ingredient prediction and cooking instruction generation. For ingredient prediction, they used a convolutional neural network (CNN) to extract visual features from food images and a long short-term memory (LSTM) network to model the textual features of ingredients. For cooking instruction generation, they used an attention mechanism to combine the visual and textual features of the ingredients and generate cooking instructions. III. METHODOLOGY Previously, food understanding efforts have primarily focused on categorizing food and ingredients. However, a comprehensive visual food recognition system should not only recognize the type of food or its ingredients but also comprehend its preparation process. The image-to-recipe problem has typically been treated as a retrieval task, where a recipe is retrieved from a fixed dataset based on the image similarity score in an embedding space. The effectiveness of these systems largely depends on the size and diversity of the dataset and the quality of the learned embedding. As a result, these systems may fail when a matching recipe for the image query is not present in the static data. In the present methodology we are training CNN with recipe details and images and this model can be used to predict recipe by uploading related images and we used 1 million recipe dataset and from this dataset we used 1000 recipes as training the entire dataset with images will take lots of memory and hours of time to train CNN model.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[3804] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com
Fig 3.1: System Architecture
The methodology involves the following steps: 1. Data Collection: Collecting high-quality food images from a diverse set of sources is critical for building an accurate and robust recipe generation model. It's important to ensure that the collected images cover a wide range of cuisines, ingredients, and cooking styles. 2. Image Preprocessing: Preprocessing the food images using computer vision techniques can help to extract useful features and improve the accuracy of the recipe generation model. This can involve techniques such as resizing, normalization, and feature extraction using CNNs. 3. Recipe Generation: Generating high-quality and diverse recipes that match the input food image is a challenging task that requires a combination of deep learning and optimization techniques. It's important to ensure that the generated recipes are both feasible and appealing to the user. IV. RESULTS AND DISCUSSIONS The result of the project is mainly based on the performance of the trained deep learning models and the evaluation metrics used. It involves measuring the accuracy of the generated recipes, which is often evaluated based on the similarity of the generated recipe to the original recipe, which is usually sourced from the online recipe databases. We have used more than 1000 images from the Recipe 1M dataset to train the model and got an accuracy of 99.662% in predicting the recipe name, ingredients and recipe preparation.
Fig 4.1: Recipe CNN Accuracy and Loss Graph
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[3805] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:03/March-2023 Impact Factor- 7.868 www.irjmets.com In above graph x-axis represents epochs and y-axis represents accuracy/loss value and blue line represents loss and orange line represents accuracy and in above graph with each increasing epoch accuracy got increased to 1 (100%) and loss decreased to 0. Any CNN model whose accuracy is high and loss is less will be considered an efficient model. V. CONCLUSION The aim of our study was to develop an image-to-recipe generation system that can produce a recipe, including a title, ingredients, and cooking instructions, from a food image. Firstly, we demonstrated the significance of modeling dependencies by predicting groups of ingredients from food images. Secondly, we investigated instruction generation that is dependent on both images and inferred ingredients, emphasizing the need to consider both modalities simultaneously. Finally, based on the outcomes of a user study, we verified the complexity of the task and confirmed that our system outperforms existing image-to-recipe retrieval methods. VI. REFERENCES [1] LukasBossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. In ECCV, 2014. [2] Micael Carvalho, R´emi Cad`ene, David Picard, Laure Soulier, Nicolas Thome, and Matthieu Cord. Cross- modal retrieval in the cooking context: Learning semantic text-image embeddings. In SIGIR, 2018. [3] Jing-Jing Chen and Chong-Wah Ngo. Deep-based ingredient recognition for cooking recipe retrieval. In ACM Multimedia. ACM, 2016. [4] Jing-Jing Chen, Chong-Wah Ngo, and Tat-Seng Chua. Cross-modal recipe retrieval with rich food attributes. In ACM Multimedia. ACM, 2017. [5] Mei-Yun Chen, Yung-Hsiang Yang, Chia-Ju Ho, Shih-Han Wang, Shane-Ming Liu, Eugene Chang, Che-Hua Yeh, and Ming Ouhyoung. Automatic chinese food identification and quantity estimation. In SIGGRAPH Asia 2012 Technical Briefs, 2012. [6] Xin Chen, Hua Zhou, and Liang Diao. Chinese Food Net: A large-scale image dataset for Chinese food recognition. CoRR, abs/1705.02743, 2017. [7] Bo Dai, Dahua Lin, Raquel Urtasun, and Sanja Fidler. Towards diverse and natural image descriptions via a conditional gan. ICCV, 2017. [8] Krzysztof Dembczy´nski, Weiwei Cheng, and Eyke H¨ullermeier. Bayes optimal multilabel classification via probabilistic classifier chains. In ICML, 2010. [9] Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation. In ACL, 2018. [10] Claude Fischler. Food, self and identity. Information (International Social Science Council), 1988. [11] Cooking with AI: A Survey on Recipe Generation using Deep Learning" by S. Sankar et al.(2021). [12] "Food Image Analysis: A Review of Recent Advances and Challenges" by K. Zhang et al. (2021) [13] "Recipe Generation from Food Images: A Survey" by R. Varshney et al. (2021). [14] "Deep Learning for Food Image Analysis: A Review" by M. Raza et al. (2020) [15] "Food Image Analysis and Recipe Generation: A Review" by L. Gao et al. (2020). [16] "Recipe Generation from Food Images: A Deep Learning Approach" by M. Han et al. (2020). [17] "Recipe Generation from Food Images using Attention-based Neural Networks" by D. Chaudhary et al. (2020). [18] "Food Recognition and Recipe Generation from Food Images: A Review" by P. Sharma et al. (2019). [19] "Recipe Generation from Food Images using Deep Neural Networks" by H. Kim et al. (2019). [20] "Recipe Generation from Food Images using Deep Learning and NLP" by Y. Liu et al. (2019).
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science