1 Introduction
An increasing number of commercial recommender applications present multiple recommendation lists in a single interface [
34]. So-called “Multi-list Recommender Interfaces” present item lists stacked on top of each other, accompanying each list with an explanation on what the items in the list represent [
22,
68]. The algorithms underlying these lists are typically either based on a variety of recommendation approaches (e.g., using different similarity measures [
22,
34]), or employ a single personalization algorithm that is optimized differently across different lists, by constraining the presented items to a certain tag [
58], or by re-ranking the top-k set on a specific attribute (cf. [
70]).
Commercial examples include video streaming services, such as Disney+ and Netflix. They present movie and TV series recommendations in an explainable multi-list interface [
22], typically providing multiple lists that each relate to a user’s preferences, but which are limited to or optimized for a specific attribute, tag, or genre. For example, lists in Netflix would be explained as “Drama TV Series” (genre constraint), “Oscar-winning movies” (movies with a specific tag), or “Recommended for you” (Collaborative Filtering with no constraints). The “sub-lists” presented within a multi-list recommender interface can be extensive: Netflix presents approx. 40 different lists on a user’s page with up to 75 recommendations per list [
22].
Multi-list interfaces may also promote items that are not personalized. For example, e-commerce platforms such as Amazon may display lists that prioritize items based on their overall popularity, because they are often purchased in the past 24 hours, or because many users have added them to their wish lists. Such lists are inferred without any user history, yet can still lead to changes in user preferences by presenting a larger number of items in an organized manner (cf. [
58]).
An illustrative example of what constitutes differences between single-list and multi-list interfaces is depicted in Figure
1 [
34]. The main distinction between them in this study is the use of multiple recommender algorithms per multi-list interface, which is also how the landing pages of some commercial applications are designed [
22]. It should be noted that what we describe as single-list interfaces is at times referred to as “grid user interfaces” or “grid UIs” in other domains [
39], while other studies use the concept of a “single list” or “single-list interface” for lists in which only a single option is presented per line [
9], as is common in search box applications (cf. [
70]).
The application of multi-list interfaces has particularly expanded in commercial domains. Whereas their use in online retail and on video streaming platforms has become more prevalent [
22,
59], research on its use in domains where users have specific behavioral goals is missing [
20,
68]. Food is such a domain, where multi-list interfaces have the potential to steer user preferences toward a specific eating goal. Whereas in the movie domain, the explanations may signal a particular genre or mood that may be appealing to a user [
22], food choices often face a tradeoff between health and popularity (or taste) [
64]. Explanations may help to mitigate this ambiguity, in attempt to align with user goals.
The promotion of healthy food choices is typically not examined in food recommender studies [
60], as many approaches are popularity-based and lead to unhealthy outcomes [
18,
76]. Since a user’s profile becomes less relevant when she wishes to change her current eating habits [
1,
63], it is often hard to generate relevant recommendations when, for example, a user takes up a new weight-loss goal or attains a vegetarian diet. While providing more control could be one way to circumvent unhealthy recommendations (e.g., in the medical domain [
47]), other studies have shown that increasing recommendation diversity could better serve a user’s interests [
59]. This can be provided by multi-list interfaces that optimize for dietary restrictions (e.g., lactose-free and vegan) or nutrient intake (e.g., less fat or more fiber). Moreover, providing appropriate, personalized explanations or justifications might further steer users toward healthier options [
50].
The commercial multi-list “benchmark” has yet to be evaluated in a user-centric approach [
34]. Whereas its merits are clear in terms of user retention and click-through rates [
22], much less is known about how users perceive the various aspects of multi-list recommender systems and how this is related to their choices. For example, do users understand the recommendation lists presented to them and does this affect from which list they choose an item? And, are multi-list interfaces
only evaluated more favorably, or do they also lead to healthier choices and choices that match a user’s eating goals? And does this depend on the type of explanation presented?
This article presents two novel studies with multi-list food recommender interfaces, which are each evaluated through a user-centric approach. We employ the user experience recommender framework [
43,
44], to assess whether the use of multiple lists in a single interface, along with explanations, leads to changes in user choices and whether these are linked to changes in how users perceive and experience the multi-list interface. To date, only a few studies have examined the relation between user evaluation aspects and multi-list interfaces. Pu and Chen [
58] compare a single-list interface with simple explanations to a category-based interface in the personal computer domain, accompanying each list with an explanation on its contents. They show that a multi-list interface is perceived as more helpful, as users could compare items more easily, even if time spent deciding was equal across both interfaces. Moreover, a related study by Nanou et al. [
52] shows that a genre-grouped movie recommender interface is evaluated as easier to use, due to a reduced cognitive load.
The premise of earlier work on multi-list recommender interfaces is to increase diversity while reducing choice overload [
30]. However, it is to date unclear how specifically choice difficulty and satisfaction are affected by the presented recommender interface, akin to work of Bollen et al. [
4]. For the food domain, we expect that a multi-list recommender system can overcome human biases toward unhealthy foods through their justifications, and lead to more satisfactorily choice outcomes by increasing the diversity of the presented recipes. Since many people lack the sufficient nutritional knowledge to make healthy food choices [
31], the introduction of list-specific explanations is expected to boost its understandability, also given earlier findings on the reduction of cognitive load [
52,
59]. Moreover, attribute framing theory [
2] suggests that nutrition-based explanations could make users pay more attention to healthy or nutrition-related aspects when choosing a recipe, particularly if they are related to personal characteristics (cf. [
50,
71]). There has, however, been little attention for how explanations in such an interface should be designed and to what extent they can persuade users to consider other content, such as healthier recipes.
We expect that a multi-list interface, bringing forth a more diverse recommendation set, is more likely to cater toward eating goals that are not yet part of the user’s profile. In terms of the interface, the most important contribution is that we can highlight different nutrient-specific eating goals, by presenting lists that optimize for recipes with fewer calories, less fat, or more fiber. Moreover, based on earlier work that shows how personally-relevant explanations (e.g., “healthy recipes that are in line with your healthy eating habits”) rather than item-focused explanations (e.g., “healthy recipes that meet dietary intake guidelines”) positively affect a user’s evaluation of a recommender system [
46], we examine the merits of using personalized explanations in a multi-list interface. For the user-centric evaluation of our multi-list food recommender system and whether it can support healthy food choice and eating goals, we propose the following research questions:
[RQ1]: To what extent is a multi-list recommender interface with (personalized) explanations evaluated more favorably in the context of the user experience recommender framework, compared to a single-list interface without (personalized) explanations?
[RQ2]: To what extent can a multi-list recommender interface with (personalized) explanations support different user goals and healthy food choices, compared to an interface without (personalized) explanations and single-list interfaces?
We present two studies. In Study 1, we examine both research questions in a recommender system based on similar-item retrieval, using a US-based dataset from the recipe website AllRecipes.com [
75]. We compare single-list (5 items) and multi-list (5
\(\times\) 5 items) interfaces, either accompanied by list-based explanations or not, by assessing a user’s evaluation and choices through Structural Equation Modelling. We find that multi-list interfaces are evaluated more favorably in terms of diversity and choice satisfaction, but also lead to a higher choice difficulty. In addition, they support healthy eating goals for specific users. In Study 2, we expand our findings for both research questions by developing a knowledge-based recommender system, using an Italian-based dataset from the recipe website GialloZafferano.it. We compare single-list and multi-list interfaces of similar length (25 items), which are accompanied by either non-personalized or personalized explanations. We find that multi-list interfaces increase the perceived diversity and understandability, but slightly decrease choice satisfaction and the healthiness of chosen recipes. In contrast, choice difficulty is unaffected across single-list and multi-list interfaces with similar set sizes.
5 Discussion
In this article, we have examined an emerging topic in the context of recommender systems. Multi-list interfaces are being used in an increasing number of commercial applications [
22]. Nonetheless, studies on how users evaluate such interfaces typically do not involve a user-centric evaluation [
34]. That is, research has yet to examine how changes in a multi-list interface relate to choice data and perception and experience aspects. Moreover, their use is limited to specific domains [
22,
34,
58], mostly consumer and leisure domains (e.g., e-commerce, movies), that do not correspond to domains where behavioral change plays a role. In fact, the interplay between multi-list interfaces and user goals, such as healthy eating, has not yet been examined empirically [
68].
This research involves one of the first empirical examinations of multi-list interfaces in the food domain. Moreover, it is also the first to have investigated to what extent a multi-list recommender interface is evaluated more favorably than a single-list interface, in the context of the user experience recommender framework of Knijnenburg and Willemsen [
43]. In performing such a user-centric evaluation, we have examined whether a multi-list interface can support healthier recipe choices and user food goals (Study 1), by designing nutrient-specific recommendation lists. Whereas other studies are based on single-item evaluations [
34] or analyses in which latent aspects are evaluated separately [
58,
59], we have linked different latent evaluation aspects in a path model. For Study 2, we could not infer a path model due to fit validity issues but have instead presented analyses with which we show how interaction data and user perceptions are related to user experience aspects.
5.1 User Evaluation (RQ1)
Regarding [RQ1], both studies reveal for most of the inquired user experience aspects that users evaluate multi-list recommenders more favorably than single-list interfaces. However, there are a few contrasting results between Study 1 and Study 2 regarding a user’s choice experience, which may have arisen due to a few differences in their respective designs.
In Study 1, we have found that users are more satisfied about the recipes they have chosen from a long multi-list interface, compared to a short single-list interface. Moreover, they also report higher levels of perceived diversity. At the same time, we find that users experience higher levels of choice difficulty when using a multi-list interface, compared to a shorter list that does not trigger choice overload (cf. [
4,
62]). These findings are consistent with earlier studies on choice overload [
32], in which people evaluate larger choice sets more favorably, but also find it more difficult to make a decision, which can also lead to choice deferral [
11]. On top of that, we have found that the addition of explanations to an unlabeled multi-list interface does not reduce this experienced choice overload, nor does it significantly increase choice satisfaction. This partially contrasts with earlier findings that an “organized view” of multiple item lists reduces the perceived cognitive effort or load [
52,
58]. It is possible that the addition of explanations does not have an impact if numerous other modalities are presented in the interface, such as a recipe’s title, photo, and description.
The main limitation to Study 1 is the difference in set sizes across single-list and multi-list interfaces. While the set size in the multi-list conditions was 25, comprising 5 algorithms in a single interface, we only showed five recipes in the single-list conditions that stemmed from a single algorithm. This seems to have also contributed to the higher levels of choice difficulty and choice satisfaction in the multi-list interface conditions, as such a list length effect is also suggested in Bollen et al. [
4]. In Study 2, we have aimed to mitigate these limitations by presenting a fairer comparison, examining the user evaluation across different interfaces that each present 25 recipes. There, we no longer find differences in choice difficulty between single-list and multi-list interfaces, while the perceived diversity and understandability is higher for the multi-list conditions.
Contrary to Study 1, we have found choice satisfaction to be lower for our multi-list interfaces in Study 2. On top of that, choice difficulty is found to increase in Study 1, while it is not affected in Study 2 across the single-list and multi-list conditions. This finding is difficult to explain using the evaluation aspects considered in this study. For one, we have observed positive relations between understandability and choice satisfaction, and between diversity and choice satisfaction. Although the recommender approaches differ across both studies (similar-item retrieval vs knowledge-based), they have been consistent within studies, and are unlikely to have led to this contradictory result. Moreover, a decrease in satisfaction due to users taking more time to find an appropriate recipe, as is shown in Jannach et al. [
34], should have also been reflected in higher levels of choice difficulty, but this was found to not differ across conditions. Instead, a factor may be that the multi-list interface, which presented at times rather extensive explanations, is more appropriate to use for experienced users, for we have observed positive relations between a user’s levels of health consciousness and cooking experience, and choice satisfaction. Earlier research has suggested, albeit in the context of preference elicitation methods, that how users interact with an interface may be moderated by that user’s domain knowledge [
41]. Future research should reveal whether this also applies to our findings.
While the experience aspects show mixed results, the perception aspects are more consistent across both studies. We find higher levels of diversity for the multi-list interfaces, even when controlling similar set sizes. Moreover, understandability has been found to be higher for multi-list interfaces in both studies, with the clearest results in Study 2. This indicates that multi-list food recommender systems contribute positively to the user’s perception of a system, even if the experience may not improve, compared to a single-list approach.
5.2 Recipe Healthiness (RQ2)
With regard to the healthiness of chosen recipes (RQ2), we are faced with mixed findings. What stands out across both studies is that the average healthiness of recipes chosen decreases when using a multi-list interface, which is not further affected by explanations. What could underpin this finding is that the multi-list interface design has empowered users with a lower level of health-related interests to seek out unhealthy foods. Another reason may be that the used explanations are salient about a persuasive intent of the system to steer users toward healthier choices [
19], which might have led to reactance among users [
15]. Particularly in Study 2, some of the explanations are explicit about the relation between nutrition and the user’s characteristics, could have also led to negative feelings about the self [
55]. It has been argued in the context of personalized nudging that being aware of this can cause resistance among users [
82], particularly in the context of health promotion [
13,
17]. A third and more practical explanation is that multi-list interfaces have made it too easy to locate foods that a user likes, and that taste-based preferences tend to trump nutrition-based preferences among most users [
49], also given the popularity of unhealthy recipes on the internet [
75]. The latter would be consistent with the lack of changes in choice difficulty.
We do not find clear advantages of the use of explanations in multi-list interfaces regarding recipe healthiness. In fact, in Study 2, personalized explanations actually led to unhealthier choices. It could be argued that current study only put forth recommendation sets with up to 25 items. This is much smaller than the number of items presented in multi-list interfaces in the movie domain, where each sub-list comprises 40 items [
22]. Such a recommendation set size seems to better lend itself for a well-explained multi-list interface. In a smaller “large sets”, however, explanations may only increase user trust as in previous studies [
58,
72], but might not significantly affect choice-related outcomes.
In Study 1, we have teased apart user choices for specific lists. We have observed a variety of choices from non-similar, nutrition-focused lists, suggesting that different users seek out different types of recipes. Although food choices in our multi-list interface were unhealthier than in single lists, we also found that the number of unhealthy “similar recipe” choices in the multi-list conditions were significantly reduced due to the use of explanations, as many users had chosen fiber-rich recipes. We argue that the increase in recipe diversity in the multi-list condition enabled users to find the recipes they are looking for. Moreover, we found that healthy recipe choices were associated with users making more choices that match their eating or recipe goals. These findings suggest that the availability of unhealthy foods will lead to relatively unhealthy choices by users who do not have any healthy eating goals, but will support users with healthy eating goals, nonetheless. Moreover, one observed shift in user choices was from recipes that were optimized for similarity, to fiber-rich recipes that had a relatively high FSA score. Future studies should attempt to pin this down more precisely, by incorporating explicit user goals in a recommendation approach, such as through a critiquing approach (cf. [
8]).
5.3 Limitations and Future Work
A question that arises from the findings in this article is to what extent the results are generalizable. Two aspects might limit the ecological validity of the performed studies. First, the use of crowd workers across two different platforms (MTurk and Prolific) that do not necessarily need to consume the chosen recipes. The use of crowd workers, particularly from Amazon MTurk, has received scrutiny for, at times, leading to data generation and research results with a high variability compared to other platforms, which may depend on extraneous factors (e.g., such as a person’s mood [
86]). However, it also offers some advantages, such as more diverse than most recruitment pools, a relatively fast recruitment of participants and higher-quality data compared with panel providers [
80]. While we have rewarded participants in line with platform norms, have ruled out “speeding”, and have selected on approval rate, while not finding any evidence of reduced motivation, we have not tracked whether they have consumed the chosen recipes. The latter would be an important step for future studies to include, in line with, for example, work from the energy domain [
65].
A second limitation to the ecological validity is the small set sizes that have been used for each recommender system. Whereas many contemporary multi-list systems include numerous items per list [
22], our systems have only included 5 per list. Although this may have led to a less realistic scenario, it is consistent with previous scientific work on multi-list studies [
34]. Moreover, the “reduced” interface design has allowed us to specifically address our research questions. Hence, both studies have sacrificed some ecological validity and generalizability to improve internal validity and experimental control.
It could be argued that the use of a recommendation approach that is not user-personalized in Study 1 is a limitation. However, various recipe websites and recommender systems use similar-item recommendation approaches that are much like our study design [
78]. Moreover, the findings from our similar-item approach are useful for domains where personalization is harder to apply, such as on platforms where most users do not have an interaction history or user account, such as news and recipe websites attracting users from general search engines (i.e., Google).
A limitation to Study 2 is that the analyses could not be compiled in a single path model, such as in Study 1. This is attributed to cross-correlations across multiple questionnaire items that emerged when organizing the user experience aspects in a structural equation model, which were absent in our principal component factor analysis. This led to divergence validity issues and overall model fit issues. Nonetheless, the reported analyses, in which interaction, perception, and experience aspects are examined separately instead, does indicate to what extent our research design has affected them. Although this approach has prevented us from performing model-driven optimization, it has illuminated how interaction and perception aspects and user characteristics affect experience aspects. Moreover, linear regression is consistent with how each edge is formed between two nodes in structural equation model.
A possible limitation for both studies is that we have not controlled for image attractiveness. Two recent studies show that users are more likely to choose recipes that are accompanied by attractive photos [
16], which can even lead to healthier choices [
70]. Due to our controlled research designs, however, we do not expect this to have affected our results in terms of user evaluation aspects and aggregate choice metrics. Nonetheless, by unpacking an image into its underlying attributes (e.g., contrast, colorfulness) [
70], image attractiveness can be added as an additional feature to a recipe database and used to further personalize recommendations, as is also done in industry applications [
22].
In a similar vein, the systems used in either study have not been evaluated on usability. Since part of the “appeal” of such multi-list systems is the ease of locating information, this could have been a useful addition. Although the explanation could have been more thoroughly, we emphasize that our interfaces are based on state-of-the-art design. Moreover, other studies have shown increased levels of perceived ease-of-use for multi-list systems [
67]. Moreover, other studies have provided indirect evidence that such systems have a high usability [
22,
34].
Regarding the explanations used in Study 2, a limiting factor is that they have not been pre-tested. The rationale is based on earlier work on knowledge-based recipe recommender systems [
50,
51], where natural language processing is used to concatenate user characteristics and recipe feature to justify healthy recipe recommendations. The lists generated in the current study were rule-based, in the sense that they were pre-defined. We acknowledge that it would have been desirable to have pre-validated their understandability and whether they support user goals. Nonetheless, we have found higher levels of perceived understandability for the detailed, personalized explanations, which were less based on earlier work that the feature-based explanations. This seems to suggest that the explanation has been satisfactory, at least on a comparative level. Nonetheless, we wholeheartedly recommend a qualitative study design regarding explanation design in a multi-list context.
Future studies should test our findings in a more naturalistic setting. For one, the number of recipes recommended should not necessarily be limited to 25. Moreover, the role of explanations could be unpacked further in this specific domain, for they can also be considered a cognitively-oriented nudge [
5], while the vertical organization of lists can be regarded as a behaviorally-oriented nudge. In this sense, a multi-list recommender interface is a diverse, yet complex decision-making environment, in which both personalization and digital nudges can steer user choices (cf. [
36]). The final choices made by users are likely to be a function of both the content and the choice architecture (cf. [
38]).
Future research could also examine the problem of recommendations and interface aspects more generally. For example, it would be interesting to tease apart the influence of recommendation algorithms and how items are organized on a user’s final choice. The top presented items might simply attract many of the choices for a specific group of users [
70], such as those who found the decision-making to be difficult [
11], like in our Study 2. Moreover, we propose to also address the challenge of healthy food or recipe recommendation with a more longitudinal perspective, by examining which recipes fall within the user’s “comfort zone” to try in the short term and whether this can improve the healthiness of a user’s diet in the long term.
The takeaways may vary for different readers of this article. While we encourage scholars to study the effects of multi-list recommender interfaces in more detail, also using qualitative methods and in other domains, we recommend practitioners to not necessarily adopt a multi-list homepage without further testing. For practitioners in the food domain, a similar-item retrieval page with explanations at the bottom of a recipe page should be beneficial, taking the results of Study 1 into account. Moreover, understandable explanations that link user characteristics to recipe features seem to be perceived positively.
5.4 Overall Conclusion
This study has presented two empirical studies on multi-list recipe recommender systems. Being among the first to do, we have shown that the extent to which multi-list interfaces have behavioral and evaluative benefits seem to depend on the interface design aspects. Compared to single-list interfaces, we have found that multi-list recommender interfaces have evaluative benefits in terms of their perceived understandability and diversity. The results for choice satisfaction and choice difficulty are mixed and warrant research in a more naturalistic environment, which includes longer lists. Based on our findings, we argue that “more lists are not always better”, when it comes to a person’s choice process and choice satisfaction, for a similar list of the same length in a grid format can yield higher levels of choice satisfaction.
Regarding user choices, the main benefits of multi-list interfaces seem to lie in helping users with specific goals to locate relevant content. This finding generalizes to domains that are similar to the food recommender domain. It seems that multi-list food recommenders are suitable for a “more like this” scenario across the board (design of Study 1), but that their usefulness for homepages (design of Study 2) requires further examination.