- Research
- Open access
- Published:
Enhanced content-based fashion recommendation system through deep ensemble classifier with transfer learning
Fashion and Textiles volume 11, Article number: 24 (2024)
Abstract
With the rise of online shopping due to the COVID-19 pandemic, Recommender Systems have become increasingly important in providing personalized product recommendations. Recommender Systems face the challenge of efficiently extracting relevant items from vast data. Numerous methods using deep learning approaches have been developed to classify fashion images. However, those models are based on a single model that may or may not be reliable. We proposed a deep ensemble classifier that takes the probabilities obtained from five pre-trained models such as MobileNet, DenseNet, Xception, and the two varieties of VGG. The probabilities obtained from the five pre-trained models are then passed as inputs to a deep ensemble classifier for the prediction of the given item. Several similarity measures have been studied in this work and the cosine similarity metric is used to recommend the products for a classified product given by a deep ensemble classifier. The proposed method is trained and validated using benchmark datasets such as Fashion product images dataset and Shoe dataset, demonstrating superior accuracy compared to existing models. The results highlight the potential of leveraging transfer learning and deep ensemble techniques to enhance fashion recommendation systems. The proposed model achieves 96% accuracy compared to the existing models.
Introduction
Several sectors and technology, including Recommender Systems (RS), have been profoundly affected by the COVID-19 pandemic. RS are comprised of algorithms that offer users tailored recommendations according to their preferences, actions, and other pertinent information. RS are vital in facilitating user exploration and discovery of similar items. In the realm of the fashion business, fashion recommendation has emerged as a prominent research area, focusing on discerning individual fashion preferences. To address this, recent advancements in visual search have empowered users to search for items using images captured by the camera or retrieved from the gallery (Dagan et al., 2023). The utilization of photographs is an integral component of online purchasing. Previous studies predominantly relied on human evaluations to establish metrics for analyzing the impact of images on consumer behavior, which imposes limitations on the range of variables and research samples that can be investigated (Wang et al., 2021). Visual perception plays a fundamental role in human comprehension of the world, encompassing aesthetic elements that influence consumer behavior, managerial decisions, employee actions, and investor choices. Visual information can be derived from various sources, including the physical environment, videos, and images. Product images are significant in online shopping as they provide buyers with essential visual information. However, internet shoppers face the challenge of making judgments without the ability to touch and feel the actual products physically.
Product images remain the primary method of presenting products and influencing purchases, even though methods like virtual changing rooms are being developed (Chaudhary et al., 2019). Multimedia artifacts, including photographs, movies, audio/speech, etc., have multiplied dramatically due to the web and multimedia technology’s phenomenal expansion and development over the past two decades. Image retrieval has developed into a problematic issue in the multimedia sector due to the large volume of data. As a result, search algorithms are becoming more complex to return photos most pertinent to user queries. Results from multimedia search engines still fall short of what users want. Digital images and other visual materials are now more widely available, particularly on the Web, the most significant image database. Yet, how simple it is to look for and manage multimedia content will determine its value. So, the need for effective picture indexing, storage, and retrieval is growing, especially on the Web (Tekli, 2022). RS has been used in a variety of fields, including music (Deldjoo et al., 2020; Hansen et al., 2020; He et al., 2018; Wundervald, 2021), movie recommendation (Davidson et al., 2010; Du et al., 2018; Ma et al., 2022), recommending fashion products (Heinrich et al., 2021; Sun et al., 2022; Sysko-Romańczuk et al., 2022; Zeng et al., 2019), novels, and medicine. The main objective of these systems is to help consumers rapidly and effectively in other areas. The RS acts like filters; they filter the information to assist consumers in discovering better goods, financial strategies, and additional related information by customizing the suggestions. Several online customer service providers, including social networking and e-commerce websites, use a recommender system as a vital feature to boost their audience and revenue.
RS are fundamental components in various domains, commonly relying on three primary approaches: content-based, collaborative filtering, and hybrid models. The content-based approach centers on leveraging item properties to make recommendations. In contrast, collaborative filtering focuses on establishing connections between items and users. The combination of content-based and collaborative filtering techniques can be hybrid (Suvarna & Padmaja, 2019). Notably, a content-based approach does not depend on user ratings. Instead, it can extract similarities by analyzing the features of new products. In this approach, a single user search query can be sufficient to generate item recommendations.
The collaborative filtering approach aims to find similar users and recommends the items. A method for Collaborative Filtering is as follows: If user A selects the objects of X, Y, and Z and user B selects the items of W, Y, and Z, then we can recommend W to A and X to B. This filtering technique first finds similar users and suggests the items based on the most similar user likes. Different similarity measures can be used, like Jaccard, Cosine, and Centered Cosine or Pearson correlation (Gomez-Uribem & Hunt, 2015; Xu et al., 2021). The problem with this approach is an inadequate set of user interactions, which can be called cold-start and data sparsity problems (Zhang et al., 2019). Cold start problem where there are fewer details about the new user and new item. In such cases recommending items or similar users is a big problem (Wang et al., 2018). Another interesting problem is sparsity, where the obtained ratings are significantly less than the suitable rating (Elahi et al., 2016) and also predicting the rating for a particular product from the user (Gupta & Gadge, 2015; Koohi & Kiani, 2016). A hybrid technique combines the two strategies mentioned above. Most of the RS concern often use a hybrid approach (Sivaramakrishnan et al., 2021).
Enough research has been done related to the RS. Initially, machine learning models were utilized in these systems (Balaji et al., 2021; Portugal et al., 2018). Nowadays, the majority of image classification techniques use supervised machine learning, in which models are trained to predict the query image class based on a single class level. Traditional computational models are one type of computer vision technology that uses intricate, manually designed computation algorithms are used to extract important features from particular regions of an image. Yamamoto and Nakazawa (2019) used a Support Vector Machine and concatenated the features obtained from multiple CNNs to experiment with FashionStyle14. 9. Sejal et al. (2016a; 2016b) proposed a Morkov-related model to suggest the images based on the search history. They performed clustering to group the images and applied cosine similarity to recommend similar users. Sejal et al. (2017) proposed an ANOVA cosine similarity for image recommendation based on the search. Here their assumption is the user is online, and the system must give recommendations based on the search query. They conducted experiments on the Myntra dataset. Jayalakshmi et al. (2022) used machine learning algorithms like clustering and Principal Component Analysis (PCA) for classifying the movie and recommendation. Initially, machine learning models were utilized in these systems (Balaji et al., 2021; Portugal et al., 2018).
However, as the field progressed, Convolutional Neural Networks (CNNs) and other deep learning models have emerged as powerful tools for improving recommendation accuracy (Tahmasebi et al., 2021). These models gain the capacity to automatically extract spatial characteristics from a large number of images by optimizing the parameters through a combination of convolutional and pooling layers. The last layer of the network predicts the class label as per the obtained features. Until an acceptable accuracy is reached, supervised learning algorithms optimize model parameters over several epochs using labeled data to learn spatial features. Sheikh Fathollahi and Razzazi (2021) proposed two CNN models: one for extracting features and another for classifying the music genre. They used two distance measures, Euclidean and Cosine similarity, to recommend the music. There they have not used any collaborative filtering. Indira et al. (2022) proposed a model that uses the CNN model for feature extraction. The features are passed to the residual network to get the recommendations. Nocentini et al. (2022) worked on different CNN models to see the performance of three datasets. Ullah et al. (2019) use a deep learning approach, a group of 5 convolution layers for feature extraction. The extracted features are passed to the Random Forest classifier to get the class label. They performed the task in two phases one was with a direct Random Forest Classifier to predict the model, and later used a deep learning model to improve the accuracy.
However, all the above-discussed models are based on a single model. Secondly, these models often recommended items without accurately identifying the specific product. In this research, we focused on developing a content-based recommendation system that eliminates the need for users to rate specific products. Instead, we aimed to classify new items based on the decisions obtained from multiple models and retrieve similar images within the same product category. To achieve this, we employed various pre-trained models leveraging Transfer Learning techniques. Furthermore, we introduced a novel deep ensemble classifier designed to classify fashion images. The retrieval of similar products was accomplished using cosine similarity.
The main contributions of this study are outlined as follows:
-
A new approach that combines multiple pre-trained models to create an ensemble classifier. This ensemble classifier improves the accuracy and robustness of the classification process.
-
A range of pre-trained models is employed with unique features and characteristics. By leveraging Transfer Learning, we extracted knowledge from these models and assigned appropriate weights to enhance the overall classification performance.
-
To assess the effectiveness of our approach, Fashion product images and Shoe datasets are used for the experimentation.
-
By testing our deep ensemble classifier on these datasets, we obtained empirical evidence of its performance and demonstrated its potential for practical applications in the fashion domain.
This research introduces a novel content-based recommendation system for fashion products, featuring a deep ensemble classifier and leveraging Transfer Learning techniques. Our approach demonstrates promising results through experiments on benchmark datasets, indicating its potential for real-world fashion recommendation scenarios.
The rest of the sections delve into the methods and techniques employed thus far in developing the recommendation system. The related work of the recommendation system is presented in Section. “Literature review”. Section “Methods” offers our novel approach to enhancing the recommendation system. Section “Results” reports the experimental results and findings from evaluating our recommendation system. In the final section of this research paper, we provide a conclusive summary of our work and its contributions to the field of recommendation systems.
Literature review
CNN related works for recommendation
CNNs and other deep learning models have emerged as powerful tools for improving recommendation accuracy. Hiriyannaiah et al. (2022) proposed a convolutional autoencoder for classification and combined the different similarity metrics using a boosting approach for recommending similar products. Cosine, Manhattan, Euclidean, Pearson Correlation, and Tanimoto coefficient similarity metrics are used and combined, all boosting methods. They implemented the proposed model on four different datasets. Gharaei et al. (2021) created a DNN model for the gender and item classification of a given image. First, they classify the gender of a given image, and the last item is classified. Based on the gender and item category, recommendations will be provided. The cosine similarity metric recommends the products for a given image. Jo et al. (2020) proposed and developed a deep learning model for the search for fashion products on Amazon product dataset. Tuinhof et al. (2019) designed and developed a neural network for product classification and applied the model to a fashion product dataset. Elleuch et al. (2021) created a deep CNN model for image classification to conduct the experiments on clothing datasets.
Suvarna and Balakrishna (2022a) have proposed and designed a novel deep learning-based ensemble classifier for recommending fashion products. Results from testing the algorithm on a dataset of fashion products are promising. This allows for product recommendations with an accuracy of 88.32%.
Suvarna and Balakrishna (2022b) have designed and implemented a deep CNN model that is both effective and efficient in categorizing the product in query. The results of the evaluation of the proposed model utilizing a dataset consisting of fashion products turned out to be satisfactory. Because of this, it is possible to give recommendations for the items that are both accurate and reliable, with an accuracy percentage of 89.09%.
Transfer learning
The process of using knowledge obtained for one task for a related task is known as transfer learning. Asiroglu et al. (2019) created two models based on Inception for the prediction and the other for recommendation. Jang et al. (2019) extracted gender and clothing features from a pertained ResNet50 backbone network and pulled another set of elements from a pertained VGG16 that had its final three fully connected layers removed. Zhang et al. (2023) created a shallow neural network that takes the inputs obtained from 4 different pre-trained models like AlexNet, InceptionV3, ResNet50, and VGG16. Wakita et al. (2016) created a Deep Neural Network (DNN) for a fashion brand recommendation system. Choudhary et al. (2023) use a deep learning-based recommendation system that uses backpropagation neural networks with a variety of nodes and numerous hidden layers to enable quick learning. A small number of representative deep learning architectures with varying numbers of hidden layers are included in this paper to enhance the model's learning capacity. Ay et al. (2019) and Seo and Shin (2019) created a Hierarchical Convolutional Neural Network (HCNN) for apparel classification. With CNN and a knowledge-embedded classifier that outputs hierarchical information, hierarchical categorization of clothing is applied in this study. Additionally, condition-CNN learns the correlation between various class levels as conditional probabilities, which are then utilized to estimate class predictions in the scoring process. Condition-CNN requires fewer trainable parameters than the baseline CNN models but achieves a higher prediction accuracy by feeding the estimated higher-level class predictions as priors to the lower-level class prediction (Kolisnik et al., 2021). By this model the article type classification accuracy is 91% on the Fashion product images dataset.
However, our analysis of the existing literature revealed several limitations that need to be addressed. Firstly, All the above discussed models predominantly relied on machine learning approaches and deep learning models and also primarily utilized limited datasets for their models. Secondly, these models often recommended items without accurately identifying the specific product. Lastly, the results obtained from these studies were based on the performance of a single model, neglecting the potential benefits of ensemble methods. To overcome these limitations, we proposed a novel two-stage content-based recommendation system that leverages deep learning techniques. This system aims to enhance the accuracy and effectiveness of fashion image classification and recommendation, addressing the above mentioned issues. With this approach, the system can give the recommendations corresponding to the given query image only. In this research, we comprehensively investigated the application of Transfer Learning in various pre-trained models. We focused on developing an advanced deep ensemble classifier designed explicitly for fashion image classification. The retrieval of similar products was observed by using various similarity measures and finally retrieved by using cosine similarity.
Methods
This section discusses the problem definition, proposed model, candidate model, Sparse DNN model, and a deep ensemble classifier. Recommending the products can be done in different ways. One is based on collaborative filtering, which is the user who likes the products that can be recommended to similar users; the other is content-based Filtering. When it comes to content-based Filtering, research is going on again in two phases one is by suggesting the k similar items from the cluster, and the other is by recommending items from the product class (Deldjoo et al., 2018; Schedl et al., 2018; Wundervald, 2021). We adopted the process of recommending k items from the class of product dataset, which is similar to the model approach used in two phases one is for classification, and the other is for extracting similar images.
Problem definition
In preparation for addressing the research problem, we conducted experiments utilizing a Fashion product images dataset. Our approach involved extracting features from the data using various pre-trained models. To decrease the dimensionality of the features that were extracted, we employed PCA. Additionally, we employed several similarity measures to facilitate the retrieval of similar images for the recommendation. We used DenseNet201 as the feature extractor, and PCA was subsequently applied (Fig. 1).
Figure 2(a) is used as input. The similar images retrieved for the test product using different similarity metrics like cosine, Manhattan, and Euclidean are presented in Fig. 2(b), (c) and (d), respectively. From Fig. 2(b–d), it is observed that the images unrelated to the test product are retrieved. To avoid these unnecessary retrieved results, we propose to perform the classification for the given image. From the given user input image and image database, the main objective is to recommend similar items of the same product. The flow of the work is explained in Fig. 3.
This study proposes a cutting-edge deep ensemble method for categorizing products that learn predictions from potential CNN models to categorize products more accurately. For this purpose, transfer learning models DenseNet, Xception, Mobile Net, and two other variations of VGG16 and VGG19 are used to fine-tune the fashion images. Probabilities obtained by the different models are passed as inputs to the deep ensemble classifier, which can use the final prediction and recommend the items like the predicted class using the cosine similarity measure. The architecture of the proposed model is explained in Fig. 4.
Pre-processing module
After loading the data, divide it into two groups: the training and testing datasets. Extract the features using all the pre-trained models and store them separately. Load the image into the target size of (224, 224). Convert each image to an array and pre-process it. After that, reshape the image to get the form of (number of images, 224, 224, 3). Pre-trained models are the primary model for the candidate model. This module's main goal is to take dataset images and derive semantic spatial representations from them. This study uses different models such as DenseNet, Xception, Mobile Net, and other variations of VGG16 and VGG19. Pass the features obtained by the pre-trained model to train the Sparse DNN. The process repeats for all the pre-trained models. Save the fine-tuned models. Use the fine-tuned models to get the probabilities. Concatenate all the prediction probabilities obtained by each model and pass those to the Deep Ensemble classifier to get the final possibilities to understand the model performance. Use the ensemble model to predict the class label of the given test product. First, we classify the given product using a cutting-edge deep ensemble classifier, and then we extract the top k comparable photos for recommendation purposes.
Sparse DNN architecture
This study used transfer learning techniques VGG16, VGG19, DenseNet, Xception, and Mobile Net. In addition, the Sparse DNN architecture is used to get the probabilities for each model, as shown in Fig. 5. Images from the data collection are first subjected to a pre-processing module for the normalization of features and the obtained features need to be reshaped as per the requirements of pre-trained models. Semantic feature maps are retrieved from these features after they are processed into a frozen convolutional basis using imagenet weights. Rather than being provided directly to the proposed classification head, the high dimensional spatial feature maps are compressed using the Global Average Pooling (GAP) layer.
Here we take three dense layers followed by dropout layers with 512, 256, and 128 neurons, respectively. We use a 0.2 dropout rate to avoid the model overfitting. The final layer is dense with softmax. In our scenario, one hundred and forty-three neurons have been employed because there are 143 classes for the large dataset. A set of 12 neurons for the apparel dataset with 12 classes and 6 neurons for the Shoe dataset is used. The suggested ensemble classifier is fed with the probabilities generated by the softmax layer.
Given a 2-D image I as input, the mathematical expression involved in applying the convolution operation using 2D-kernel K is in Eq. (1)
The activation function used is ReLU between the layers which is in Eq. (2)
Here the function takes the max value when the x exceeds zero. Otherwise, it is 0.
Batch Normalization is applied before dense with softmax layer. It is used to equalize the inputs to each layer and its equation in (3)
Here the weights are optimized using the following equation. Here we divided the learning rate with the history. Which is shown in Eq. (4).
where the \(\hat{m}_{t}\) and history update is given in Eqs. (5) and (6)
Categorical cross-entropy is used as a loss function as the data has multiple classes to classify. The mathematical formula for the categorical cross-entropy is in Eq. (7)
The probabilities connected to a multinoulli distribution are frequently predicted using the softmax function shown in Eq. (8)
The proposed model is explained as step by step in Algorithm 1.
Deep ensemble classifier
Ensemble methods are mainly categorized into sequential ensemble techniques and similar ensemble techniques. Ensemble classifiers improve performance by learning from multiple models rather than one model. Ensemble algorithms work by merging various models into one model. This model increases the accuracy as it learns things from other models. Stacking, bagging, and boosting are the commonly used approaches in the ensemble. These models are suitable for classification and regression tasks as they increase the high accuracy and decrease the bias-variance. This model's drawback is that it ignores the label's confidence element and only takes into account the final prediction label. A deep ensemble classifier that takes the inputs as confidences produced by the multiple candidate models.
In this work, predicted probabilities from five different candidate models are obtained and passed to the proposed ensemble classifier with five input layers. These five input layers follow the fusion layer to get the best features from the input layers. Later three fully connected layers are placed, with 800,500 and 150 each. Each following a dropout of 0.2 and all the components of the network is regularized to avoid overfitting. Softmax connects the final dense layer with the k-class number of neurons. The model uses Adam as an optimizer for optimizing the parameters and sparse categorical cross entropy is used as a loss function. We used early stopping to prevent the model from overfitting.
Results
In this section, we delved into the details regarding the datasets employed to review the model, the experimental environment in which the evaluations were conducted, the evaluation metrics used to assess the model's performance, and a comprehensive analysis of the obtained results.
Datasets
Three datasets are used to observe the performance of the model. The first dataset is the Fashion product images (Large) dataset, the second one is the Fashion product images (Apparel) dataset which is only the 12 classes, and the third one is the Shoe Dataset taken from the Kaggle (Aggarwal, 2019; Yogesh, 2021). The summary of datasets is described in Table 1. The Fashion product images (Large) dataset contains 44 k images with 143 classes. The extracted Apparel dataset from the large dataset has 12 classes. Shoe dataset is downloaded from the kaggle, it has 6 classes where each class contains 249 images. The details of Fashion product images (Apparel) dataset and shoe dataset are provided in Table 2.
Experimental environment
Experimental studies use the Windows operating system, specifically Windows 10, version 21H2. The hardware setup includes an Intel(R) Xeon(R) Silver 4114 CPU with a clock speed of 2.20 GHz and an NVIDIA Quadro RTX 5000 graphics card. The experiments primarily utilize the CPU's processing power and the GPU's computational capabilities. To facilitate the execution of the experiments, scripts are developed and written in the Python programming language, which allows for efficient implementation and control of various experimental procedures and data analysis. This combination of hardware and software components provides a robust and versatile platform for conducting experiments in a controlled and efficient manner.
Evaluation metrics
In our work, the main objective is to perform classification and similarity measurement tasks using a specific model. To assess the effectiveness and accuracy of this model, we evaluate its performance using classification metrics. These metrics provide a quantitative analysis of how well the model can classify different instances or data points into predefined categories or classes. By evaluating the model's performance using classification metrics, we can gain insights into its ability to identify and assign instances to the correct categories accurately. This evaluation allows us to measure important aspects such as precision, recall, accuracy, and F1 score, which provide a comprehensive understanding of the model's classification capabilities.
Furthermore, in addition to classification, our work also focuses on similarity measurement. Similarity measurement quantifies the likeness or resemblance between different instances or data points. By incorporating similarity measurement into our evaluation, we can determine how well the model can capture and represent the similarities between some other cases, which is crucial in various domains such as information retrieval, recommendation systems, and clustering.
Classification metrics
The metrics employed for the analysis of the proposed method, in addition to classification accuracy, are precision, recall, and F1-score. Equations for accuracy, precision, recall, and F1 score are in [9–12].
Similarity measures
Typically, the recommendation system requires a similarity matrix to recommend the products to the user. There are many ways to measure the similarity between two products: Euclidean Distance, Manhattan Distance, Murkowski Distance, Hamming Distance, etc. Many works were carried out on similar metrics, and most authors worked on Cosine similarity and Euclidean distance. Scikit Learn is a library in Python that has a cosine similarity function. This function yields a matrix with similarity scores between one item and the other. Sort the scores and recommend the items or products with the highest similarity score. Cosine similarity, Manhattan, and Euclidean distances are used in this experimentation, and the equation for Cosine similarity, Manhattan, and Euclidean distances between points A and B has shown in Eqs. (13–15) respectively.
Result analysis on Fashion product images (Apparel) dataset
This section discusses the results obtained during various pre-trained models for the Apparel dataset, which contains 12 classes of data, and extensive data, which includes 143 types. The size of the initial dataset, including the attributes, is 44,000 × 11. Apparel data for 12 categories are listed in Table 3 under the class column. Using the article type, extract 12 classes of image data, and combine all 12 styles. 14,795 × 2 was used to create a new data frame with the name and type of the image file.
For our initial trials, we have considered the Fashion product images dataset. In this work, predicted probabilities from 5 different candidate models are obtained and passed to the proposed ensemble classifier with five input layers. These five input layers follow the fusion layer to get the best features from the input layers. The proposed ensemble model receives initial predictions from the regularised classification head, which are then used as inputs. After splitting the data, the number of samples used for training and testing purposes is given in Table 3. These details are very much required to understand the performance analysis of different models. Performance analysis of different candidate models on the fashion product dataset is described in Tables 4 and 5.
The performance of VGG19, Xception, MobileNet, VGG16, and DenseNet is given in Tables 4 and 5, respectively. In addition, the graphical illustration of the performance outcome of various models in the form of precision, recall, and F1-Score has been shown in Figs. 6, 7, 8 respectively. From the obtained results, we can observe that the items in class 0 are not correctly classified as there are fewer images in the data. There are 15 images in that class, and only the three products are available in the test data. MobileNet, VGG16, and VGG19 are not classified correctly for at least one item in that class. Class 1 is predicted fully with MobileNet and DenseNet with 100%, whereas VGG19 and Xception values are the same and low VGG16.Class 2 is fully classified with DenseNet, followed by VGG16 and VGG19. Class 3 has the highest accuracy with VGG19, followed by VGG16 and DenseNet. However, less precision is obtained with Xception and MobileNet. Class 4 Track pants is classified well with DenseNet. At the same time, class 5 Dresses are classified fully with VGG16. DenseNet has the highest accuracy with class 6 Trousers, with 105 images to classify. Class 6 Shorts are classified with almost 99% with Vgg16 followed by Exception. MobileNet classified the Jeans category with 95% accuracy. Tops are classified as 89% with VGG16; the lowest classifier is Xception with 75%. Shirts have the highest classification with 98% accuracy, VGG16 and DenseNet predicted with 97% accuracy. T-shirts are properly classified as 97% with VGG16 and MobileNet.
Result analysis on Fashion product images (Large) dataset and Shoe dataset
The proposed model is applied on a Fashion product images (Large) dataset that contains 143 classes. Model-wise overall performance along with the deep ensemble classifier is reported in Table 6 as it is very difficult to analyze the class-wise information as it has many classes. Additionally, we included the outcomes of several alternative models in Table 6 for comparison purposes. This table provides a comprehensive overview of the findings from applying different candidate models to the Fashion product images (Large) and Shoe dataset. By examining the results presented in Table 6, one can gain insights into the relative effectiveness and performance of the various models in handling this dataset.
Figure 9 provides a graphical illustration of performance outcomes on various fashion and shoe dataset models. In this MobileNet, DenseNet and VGG16 performed well compared to other models like VGG19 and Xception. The classification accuracy of VGG19 was the lowest (87.15%), and MobileNet had the highest accuracy (88.9%). The models VGG19 and Xception look similar. MobileNet performs better than other models in terms of classification measures. The classification accuracy with MobileNet is high on Shoe data. Later DenseNet and Vgg16 worked well. However, the models VGG19 and Xception look similar in both datasets.
We extend our experimental experiments using the Shoe dataset to assess the efficiency of the suggested ensemble strategy with small-size datasets. The number of samples and samples available in each class is described in Table 7. Results of multiple baseline candidate model training on the Shoe dataset are shown in Tables 8, 9.
Tables 8, 9 show the performance of MobileNet, Xception, VGG16, DenseNet, and VGG19, respectively. In addition, Figs. 10, 11, 12 presents the performance outcome of various models in the form of precision, recall, and F1-Score respectively on the shoe dataset. Class 0, Flip_Flops, is well classified with VGG16 and DenseNet. However, the rest of the models do not perform well the class 0. VGG19 and DenseNet performed well with Class 1 Soccer Shoes. MobileNet and VGG19 give the best performance for Class 2 Boots. Class 3 Sneakers are predicted correctly with VGG19 and MobileNet. Class4 Loafers items are indicated accurately with MobileNet. However, it was wrongly predicted with VGG19. Class 5 sandals are predicted correctly with DenseNet. However, MobileNet and DenseNet are performing well with the Shoe dataset. With this dataset, a MobileNet had the maximum accuracy (81.1%), whereas all other models could only achieve accuracy levels of less than 80%. In contrast to VGG19, Xception, DenseNet, and VGG16 models produce results with higher accuracy. According to these findings, the MobileNet architecture is more suited to a short dataset with reduced misclassification rates for product recommendation.
Results obtained with deep ensemble classifier
Analysis of results obtained with deep ensemble models is described in Table 10. For the Fashion product images (Apparel) dataset, Shirts, T-shirts, Shorts, Jeans, and Track Pants are classified accurately. In contrast, Trousers, Dresses, Jackets, and Skirts are classified as reasonably good, but Waistcoat, Stockings, and Tops are not appropriately predicted. Related to the Shoe dataset, sandals are correctly categorized, and the rest of the products are reasonably good.
We also verified which combination of models gives the best accuracy. For the initial task, we combined MobileNet and DenseNet, which provides an accuracy of 94.09, with MobileNet and VGG19 also getting the same precision. When DenseNet is combined with MobileNet and VGG19, accuracy is improved, and when four models are combined, there is no improvement in accuracy. Finally, the accuracy reaches around 96% when all the models are combined.
Table 11 compares the proposed model's outcome with other existing works on the fashion dataset. The accuracy of the proposed model increases to 36% when compared with the model (Gharaei et al., 2021), and 30% increase with the existing model (Nocentini et al., 2022), a 10% rise compared with the model (Suvarna and Balakrishna 2022a), a 7% rise compared with the model (Suvarna and Balakrishna 2022b). 5% raise when compared with the proposed model (Kolisnik et al., 2021). We tested the proposed model with three different types of data scuh as Fashion product images (Apparel), other one with Fashion product images (Large) and Shoe data. Even with a massive amount of data, our model still works well. Our model is performing well even with comprehensive data. A comparison of the proposed model with other works can also be seen in the following Fig. 13. The input images a,b,c, and d are used as shown in Fig. 14. The recommended products for the given test image using different similarity measures are presented in Figs. 15–26. After classifying the given product, we extracted similar images by applying the cosine similarity measure to features already obtained during the pre-processing phase. Here we considered the features with the highest accuracy obtained with pre-trained models.
From the above results, one can observe that the retrieved results match the original product. Manhattan and Euclidean retrieving results will be less than 80% matching, whereas the retrieved results are above 96% with cosine similarity. However, we also extracted the results of the query image are majorly based on the color and pattern of the query image Fig. 14(a) which can be observed in Figs. 15,16, and 17. In Fig. 14b the query image is top with checks patterns, and the recommendation system provides the top with checks and gestures of the person also. If there are the same color images, then it is presenting first, and later it is looking for the pattern and gesture of the image based on the similarity measure. The same pattern can be found in Fig. 18a–d, 19a, d, 20b, c. In Fig. 14c the given query image is a bag with green color. The obtained results produced by the different similarity measures can be observed are similar in Fig. 21a, b, 22d, 23c, d. 1n Fig. 14d the query image is a shoe with white color and yellow and black stripes. The system is recommending the same in Fig. 24d, 25a, c, 26b, d. Also, we grouped the obtained results based on the pattern and color of the query images.
User Evaluation: We have selected 100 users for the manual testing. Where every user is supposed to select 10 images of their interest from the database for the given image. On the other hand, the results are retrieved by using different similarity measures. The items selected by the user and results retrieved by the proposed model are 90% matching. This shows the proposed model is producing the products with 90% confidence for the given image to the user.
Conclusions
In the context of the COVID-19 pandemic, where the online retail industry has witnessed substantial growth, our research contributes to the advancement of RS for fashion. By accurately categorizing fashion images and providing personalized recommendations, our approach can cater to the evolving needs and preferences of consumers during these challenging times. In this research study, we introduce a novel approach for predicting the category of fashion images by utilizing a deep ensemble classifier. The proposed model leverages multiple candidate models, including DenseNet, Xception, MobileNet, and two variations of VGG16 and VGG19, to fine-tune the fashion images effectively. By employing transfer learning techniques, the deep ensemble model is trained to classify fashion products and subsequently retrieve similar items from a comprehensive database, thus enabling the development of a fashion recommendation system. To evaluate the performance of the proposed model, three datasets such as Fashion Products Images (Apparel), Fashion Products Images (Large) and a Shoe dataset are used to conduct the investigations. Through comparative analysis, we demonstrate that our proposed method significantly improves predictive accuracy compared to existing approaches. The deep ensemble classifier effectively captures fashion images' complex patterns and features, allowing for more accurate and reliable categorization.
Our findings highlight the potential of deep ensemble models in fashion image classification and recommendation systems. Integrating multiple candidate models enhances the overall predictive power, enabling more robust and accurate classification results. Furthermore, the successful retrieval of similar fashion items from the database demonstrates our proposed approach's practical applicability and potential utility in real-world scenarios. The results of this study contribute to the growing body of research in computer vision and fashion recommendation systems. The demonstrated improvement in performance underscores the effectiveness of our deep ensemble classifier in tackling the challenges of fashion image classification and recommendation. Future research directions may include exploring additional candidate models and evaluating the proposed approach on more extensive and diverse datasets to validate its effectiveness and generalizability.
Availability of data and materials
Data in this research paper will be shared upon request with the corresponding author.
References
Aggarwal, P. (2019). Fashion product images dataset. Retrieved April 1, 2023, from https://www.kaggle.com/paramaggarwal/fashion-product-images-dataset
Asiroglu, B., Atalay, M. I., Balkaya, A., Tüzünkan, E., Dağtekin, M., & ENSARİ, T. (2019). Smart clothing recommendation system with deep learning. In 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (pp. 1–4). IEEE. https://doi.org/10.1109/ISMSIT.2019.8932738
Ay, B., Aydın, G., Koyun, Z., & Demir, M. (2019). A visual similarity recommendation system using generative adversarial networks. In 2019 international conference on deep learning and machine learning in emerging applications (Deep-ML) (pp. 44–48). IEEE. https://doi.org/10.1109/Deep-ML.2019.00017
Balaji, T. K., Annavarapu, C. S. R., & Bablani, A. (2021a). Machine learning algorithms for social media analysis: A survey. Computer Science Review, 40, Article 100395. https://doi.org/10.1016/j.cosrev.2021.100395
Chaudhary, C., Goyal, P., Tuli, S., Banthia, S., Goyal, N., & Chen, Y. P. P. (2019). A novel multimodal clustering framework for images with diverse associated text. Multimedia Tools and Applications, 78, 17623–17652. https://doi.org/10.1007/s11042-018-7131-x
Choudhary, C., Singh, I., & Kumar, M. (2023). SARWAS: Deep ensemble learning techniques for sentiment based recommendation system. Expert Systems with Applications, 216, Article 119420. https://doi.org/10.1016/j.eswa.2022.119420
Dagan, A., Guy, I., & Novgorodov, S. (2023). Shop by image: Characterizing visual search in e-commerce. Information Retrieval Journal, 26(1), 2. https://doi.org/10.1007/s10791-023-09418-1
Davidson, J., Liebald, B., Liu, J., Nandy, P., Van Vleet, T., Gargi, U., & Sampath, D. (2010, September). The YouTube video recommendation system. In Proceedings of the fourth ACM conference on Recommender systems (pp. 293–296). https://doi.org/10.1145/1864708.1864770
Deldjoo, Y., Elahi, M., Quadrana, M., & Cremonesi, P. (2018). Using visual features based on MPEG-7 and deep learning for movie recommendation. International Journal of Multimedia Information Retrieval, 7, 207–219. https://doi.org/10.1007/s13735-018-0155-1
Deldjoo, Y., Schedl, M., Cremonesi, P., & Pasi, G. (2020). Recommender systems leveraging multimedia content. ACM Computing Surveys (CSUR), 53(5), 1–38. https://doi.org/10.1145/3407190
Du, X., Yin, H., Chen, L., Wang, Y., Yang, Y., & Zhou, X. (2018). Personalized video recommendation using rich content from videos. IEEE Transactions on Knowledge and Data Engineering, 32(3), 492–505. https://doi.org/10.1109/TKDE.2018.2885520
Elahi, M., Ricci, F., & Rubens, N. (2016). A survey of active learning in collaborative filtering recommender systems. Computer Science Review, 20, 29–50. https://doi.org/10.1016/j.cosrev.2016.05.002
Elleuch, M., Mezghani, A., Khemakhem, M., & Kherallah, M. (2021). Clothing classification using deep CNN architecture based on transfer learning. In A. Abraham, S. Shandilya, L. Garcia-Hernandez, & M. Varela (Eds.), Hybrid intelligent systems. HIS 2019. Advances in intelligent systems and computing. (Vol. 1179). Springer. https://doi.org/10.1007/978-3-030-49336-3_24
Gharaei, N. Y., Dadkhah, C., & Daryoush, L. (2021). Content-based clothing recommender system using deep neural network. In 2021 26th International Computer Conference, Computer Society of Iran (CSICC) (pp. 1-6). IEEE. https://doi.org/10.1109/CSICC52343.2021.9420544.
Gomez-Uribe, C. A., & Hunt, N. (2015). The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS), 6(4), 1–19. https://doi.org/10.1145/2843948
Gupta, J., & Gadge, J. (2015, January). Performance analysis of recommendation system based on collaborative filtering and demographics. In 2015 international conference on communication, information & computing technology (iccict) (pp. 1–6). IEEE. https://doi.org/10.1109/ICCICT.2015.7045675
Hansen, C., Hansen, C., Maystre, L., Mehrotra, R., Brost, B., Tomasi, F., & Lalmas, M. (2020, September). Contextual and sequential user embeddings for large-scale music recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems (pp. 53–62). https://doi.org/10.1145/3383313.3412248
He, X., He, Z., Song, J., Liu, Z., Jiang, Y. G., & Chua, T. S. (2018). NAIS: Neural attentive item similarity model for a recommendation. IEEE Transactions on Knowledge and Data Engineering, 30(12), 2354–2366. https://doi.org/10.1109/TKDE.2018.2831682
Heinrich, B., Hopf, M., Lohninger, D., Schiller, A., & Szubartowicz, M. (2021). Data quality in recommender systems: The impact of completeness of item content data on prediction accuracy of recommender systems. Electronic Markets, 31, 389–409. https://doi.org/10.1007/s12525-019-00366-7
Hiriyannaiah, S., Siddesh, G. M., & Srinivasa, K. G. (2022). Deep visual ensemble similarity (DVESM) approach for visually aware recommendation and search in smart community. Journal of King Saud University-Computer and Information Sciences, 34(6), 2562–2573. https://doi.org/10.1016/j.jksuci.2020.03.009
Indira, D. N., Markapudi, B. R., Chaduvula, K., & Chaduvula, R. J. (2022). Visual and buying sequence features-based product image recommendation using optimization based deep residual network. Gene Expression Patterns, 45, Article 119261. https://doi.org/10.1016/j.gep.2022.119261
Jang, Y. H., Park, S. C., & Kim, H. (2019). Design and implementation of social content recommendation system based on influential ranking algorithm management. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-019-01275-5
Jayalakshmi, S., Ganesh, N., Čep, R., & Senthil, M. J. (2022). Movie recommender systems: Concepts, methods, challenges, and future directions. Sensors (basel)., 22(13), Article 4904. https://doi.org/10.3390/s22134904
Jo, J., Lee, S., Lee, C., Lee, D., & Lim, H. (2020). Development of fashion product retrieval and recommendations model based on deep learning. Electronics, 9(3), Article 508. https://doi.org/10.3390/electronics9030508
Kolisnik, B., Hogan, I., & Zulkernine, F. (2021). Condition-CNN: A hierarchical multi-label fashion image classification model. Expert Systems with Applications, 182, Article 115195. https://doi.org/10.1016/j.eswa.2021.115195
Koohi, H., & Kiani, K. (2016). User-based collaborative filtering using fuzzy C-means. Measurement, 91, 134–139. https://doi.org/10.1016/j.measurement.2016.05.058
Ma, X., Sun, Y., Guo, X., Lai, K. H., & Vogel, D. (2022). Understanding users’ negative responses to recommendation algorithms in short-video platforms: a perspective based on the Stressor-Strain-Outcome (SSO) framework. Electronic Markets. https://doi.org/10.1007/s12525-021-00488-x
Nocentini, O., Kim, J., Bashir, M. Z., & Cavallo, F. (2022). Image classification using multiple convolutional neural networks on the fashion-MNIST dataset. Sensors, 22(23), Article 9544. https://doi.org/10.3390/s22239544
Portugal, I., Alencar, P., & Cowan, D. (2018). The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications, 97, 205–227. https://doi.org/10.1016/j.eswa.2017.12.020
Schedl, M., Zamani, H., Chen, C. W., Deldjoo, Y., & Elahi, M. (2018). Current challenges and visions in music recommender systems research. International Journal of Multimedia Information Retrieval, 7, 95–116. https://doi.org/10.1007/s13735-018-0154-2
Sejal, D., Abhishek, D., Venugopal, K. R., Iyengar, S. S., & Patnaik, L. M. (2016b). IR_URFS_VF: Image recommendation with user relevance feedback session and visual features in vertical image search. International Journal of Multimedia Information Retrieval, 5, 255–264. https://doi.org/10.1007/s13735-016-0111-x
Sejal, D., Ganeshsingh, T., Venugopal, K. R., Iyengar, S. S., & Patnaik, L. M. (2017). ACSIR: ANOVA cosine similarity image recommendation in vertical search. International Journal of Multimedia Information Retrieval, 6, 143–154. https://doi.org/10.1007/s13735-017-0124-0
Sejal, D., Rashmi, V., Venugopal, K. R., Iyengar, S. S., & Patnaik, L. M. (2016a). Image recommendation based on keyword relevance using absorbing Markov chain and image features. International Journal of Multimedia Information Retrieval, 5, 185–199. https://doi.org/10.1007/s13735-016-0104-9
Seo, Y., & Shin, K.-S. (2019). Hierarchical convolutional neural networks for fashion image classification. Expert Systems with Applications, 116, 328–339. https://doi.org/10.1016/j.eswa.2018.09.022
Sheikh Fathollahi, M., & Razzazi, F. (2021). Music similarity measurement and recommendation system using convolutional neural networks. International Journal of Multimedia Information Retrieval, 10, 43–53. https://doi.org/10.1007/s13735-021-00206-5
Sivaramakrishnan, N., Subramaniyaswamy, V., Viloria, A., Vijayakumar, V., & Senthilselvan, N. (2021). A deep learning-based hybrid model for recommendation generation and ranking. Neural Computing and Applications, 33, 10719–10736. https://doi.org/10.1007/s00521-020-04844-4
Sun, J., Song, J., Jiang, Y., Liu, Y., & Li, J. (2022). Prick the filter bubble: A novel cross-domain recommendation model with adaptive diversity regularization. Electronic Markets. https://doi.org/10.1007/s12525-021-00492-1
Suvarna, B., & Balakrishna, S. (2022a). A Novel deep ensemble classifier for recommending fashion products. In 2022 3rd International Conference on Communication, Computing and Industry 4.0 (C2I4) (pp. 1–6). IEEE. https://doi.org/10.1109/C2I456876.2022.10051256
Suvarna, B., & Balakrishna, S. (2022b). An efficient fashion recommendation system using a deep CNN model. In 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS) (pp. 1179–1183). IEEE. https://doi.org/10.1109/ICACRS55517.2022.10029063
Suvarna, B., & Padmaja, M. (2019). A Recommender system for the proactive sharing of architectural knowledge. AMA_B., 62, 1–10. https://doi.org/10.18280/ama_b.620101
Sysko-Romańczuk, S., Zaborek, P., Wróblewska, A., Dąbrowski, J., & Tkachuk, S. (2022). Data modalities, consumer attributes, and recommendation performance in the fashion industry. Electronic Markets, 32(3), 1279–1292. https://doi.org/10.1007/s12525-022-00579-3
Tahmasebi, H., Ravanmehr, R., & Mohamadrezaei, R. (2021). Social movie recommender system based on deep autoencoder network using Twitter data. Neural Computing and Applications, 33, 1607–1623. https://doi.org/10.1007/s00521-020-05085-1
Tekli, J. (2022). An overview of cluster-based image search result organization: Background, techniques, and ongoing challenges. Knowledge and Information Systems, 64(3), 589–642. https://doi.org/10.1007/s10115-021-01650-9
Tuinhof, H., Pirker, C., & Haltmeier, M. (2019). Image-based fashion product recommendation with deep learning. In G. Nicosia, P. Pardalos, G. Giuffrida, R. Umeton, & V. Sciacca (Eds.), Machine learning, optimization, and data science. LOD 2018. Lecture Notes in Computer Science. (Vol. 11331). Springer.
Ullah, F., Zhang, B., & Khan, R. U. (2019). Image-based service recommendation system: A JPEG-coefficient RFs approach. IEEE Access, 8, 3308–3318. https://doi.org/10.1109/ACCESS.2019.2962315
Wakita, Y., Oku, K., & Kawagoe, K. (2016). Toward fashion-brand recommendation systems using deep-learning: Preliminary analysis. International Journal of Knowledge Engineering. https://doi.org/10.18178/ijke.2016.2.3.066
Wang, D., Liang, Y., Xu, D., Feng, X., & Guan, R. (2018). A content-based recommender system for computer science publications. Knowledge-Based Systems, 157, 1–9. https://doi.org/10.1016/j.knosys.2018.05.001
Wang, M., Li, X., & Chau, P. Y. (2021). Leveraging image-processing techniques for empirical Research: Feasibility and Reliability in Online Shopping Context. Information Systems Frontiers, 23, 607–626. https://doi.org/10.1007/s10796-020-09981-8
Wundervald, B. (2021). Cluster-based quotas for fairness improvements in music recommendation systems. International Journal of Multimedia Information Retrieval, 10(1), 25–32. https://doi.org/10.1007/s13735-020-00203-0
Xu, Y., Wu, Y., Gao, H., Song, S., Yin, Y., & Xiao, X. (2021). Collaborative APIs recommendation for artificial intelligence of things with information fusion. Future Generation Computer Systems., 125, 471–479. https://doi.org/10.1016/j.future.2021.07.004
Yamamoto, T., & Nakazawa A. (2019) Fashion Style Recognition Using Component-Dependent Convolutional Neural Networks. In 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 3397–3401, https://doi.org/10.1109/ICIP.2019.8803622.
Yogesh, S. (2021). Shoe type classification data. Retrieved April 1, 2023, from https://www.kaggle.com/datasets/noobyogi0100/shoe-dataset
Zeng, M., Cao, H., Chen, M., & Li, Y. (2019). User behavior modeling, recommendations, and purchase prediction during shopping festivals. Electronic Markets, 29, 263–274. https://doi.org/10.1007/s12525-018-0311-8
Zhang, S., Yao, L., Sun, A., & Tay, Y. (2019). Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR), 52(1), 1–38. https://doi.org/10.1145/3285029
Zhang, Y., He, K., & Song, R. (2023). Image multi-feature fusion for clothing style classification. IEEE Access, 11, 107843–107854. https://doi.org/10.1109/ACCESS.2023.3320270
Acknowledgements
I am very thankful to VFSTR Deemed to be University, for providing the research infrastructure for my research work.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Conceptualization, BS; methodology, BS and SB; validation, BS and SB; formal analysis, BS. and SB; writing—original draft preparation, BS; writing—review and editing, BS, and SB, supervision, SB, and BS. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Suvarna, B., Balakrishna, S. Enhanced content-based fashion recommendation system through deep ensemble classifier with transfer learning. Fash Text 11, 24 (2024). https://doi.org/10.1186/s40691-024-00382-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40691-024-00382-y