OutfitNet: Fashion outfit recommendation with attention-based multiple instance learning
Recommending fashion outfits to users presents several challenges. First of all, an outfit
consists of multiple fashion items, and each user emphasizes different parts of an outfit when
considering whether they like it or not. Secondly, a user's liking for a fashion outfit considers
not only the aesthetics of each item but also the compatibility among them. Lastly, fashion
outfit data is often sparse in terms of the relationship between users and fashion outfits. Not
to mention, we can only obtain what the users like, but not what they dislike. To address the …
consists of multiple fashion items, and each user emphasizes different parts of an outfit when
considering whether they like it or not. Secondly, a user's liking for a fashion outfit considers
not only the aesthetics of each item but also the compatibility among them. Lastly, fashion
outfit data is often sparse in terms of the relationship between users and fashion outfits. Not
to mention, we can only obtain what the users like, but not what they dislike. To address the …
Recommending fashion outfits to users presents several challenges. First of all, an outfit consists of multiple fashion items, and each user emphasizes different parts of an outfit when considering whether they like it or not. Secondly, a user’s liking for a fashion outfit considers not only the aesthetics of each item but also the compatibility among them. Lastly, fashion outfit data is often sparse in terms of the relationship between users and fashion outfits. Not to mention, we can only obtain what the users like, but not what they dislike.
To address the above challenges, in this paper, we formulate the fashion outfit recommendation problem as a multiple-instance-learning (MIL) problem. We propose OutfitNet, a fashion outfit recommendation framework that includes two stages. The first stage is a Fashion Item Relevancy network (FIR), which learns the compatibility between fashion items and further generates relevancy embedding of fashion items. In the second stage, an Outfit Preference network (OP) learns the users’ tastes for fashion outfits using visual information. OutfitNet takes in multiple fashion items in a fashion outfit as input, learns the compatibility among fashion items, the users’ tastes toward each item, as well as the users’ attention on different items in the outfit with the attention mechanism.
Quantitatively, our experiments show that OutfitNet outperforms state-of-the-art models in two tasks: fill-in-the-blank (FITB) and personalized outfit recommendation. Qualitatively, we demonstrate that the learned personalized item scores and attention scores capture well the users’ fashion tastes, and the learned fashion item embeddings capture well the compatibility relationships among fashion items. We also leverage the learned fashion item embedding and propose a simple fashion outfit generation framework, which is shown to produce high-quality fashion outfit combinations.
ACM Digital Library