- (eds.), Machine Learning, Proceedings of the TwentySecond International Conference (ICML 2005), Bonn, Germany, August 7-11, 2005, volume 119 of ACM International Conference Proceeding Series, pp. 89–96. ACM, 2005.
Paper not yet in RePEc: Add citation now
- • A step-decay function is used for the learning rate annealing schedule. The decay factor is tuned (Duchi et al., 2011).
Paper not yet in RePEc: Add citation now
- Another version of this loss function which uses the sum of differences between the scores rather than max function was proposed by Weston et al. (1999) and is defined as: dCH(y, s) = X j∈I:yj =1 i∈I\j max 1 + si − sj, 0
Paper not yet in RePEc: Add citation now
- Arrow, K. J. Social Choice and Individual Values. John Wiley & Sons, 1951.
Paper not yet in RePEc: Add citation now
- ArXiv e-prints, March 2018. Powers, D. M. Evaluation: From precision, recall and fmeasure to roc., informedness, markedness & correlation.
Paper not yet in RePEc: Add citation now
- Bader, J. M. Hypervolume-Based Search for Multiobjective Optimization: Theory and Methods. CreateSpace, Paramount, CA, 2010.
Paper not yet in RePEc: Add citation now
- Ben-Akiva, M. E., Lerman, S. R., and Lerman, S. R. Discrete choice analysis: theory and application to travel demand, volume 9. MIT press, 1985.
Paper not yet in RePEc: Add citation now
- Benson, A. R., Kumar, R., and Tomkins, A. A Discrete Choice Model for Subset Selection. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, pp. 37–45. ACM, 2018.
Paper not yet in RePEc: Add citation now
Bettman, J. R., Luce, M. F., and Payne, J. W. Constructive consumer choice processes. Journal of consumer research, 25(3):187–217, 1998.
- Bringmann, K. and Friedrich, T. Approximating the least hypervolume contributor: Np-hard in general, but fast in practice. Theor. Comput. Sci., 425:104–116, 2012. Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. N. Learning to rank using gradient descent. In Raedt, L. D. and Wrobel, S.
Paper not yet in RePEc: Add citation now
- Bringmann, K. and Friedrich, T. Approximating the volume of unions and intersections of high-dimensional geometric objects. Computational Geometry, 43(6):601 – 610, 2010.
Paper not yet in RePEc: Add citation now
- Categorical Hinge Loss This loss function is inspired from a variation of hinge loss proposed for multi-class classification (Dogan et al., 2016; Moore & DeNero, 2011) and is Learning Choice Functions used only for the discrete choice setting. It upper bounds the categorical 0/1-loss and is defined as: dCH(y, s) = max 1 + max (i,j∈I):yj =1,yi=0 (si − sj), 0 This loss basically takes the maximum difference between the score sj of chosen object yj = 1 and score si of other objects i ∈ I \ j in Q. So, it the score of any objects which are not chosen is greater than the score of the chosen object si > sj then it results in high loss value as shown in Figure 5. We use this loss function over categorical crossentropy because it not only penalizes if the predicted score is low but also accounts for margin to the scores of other objects in the given choice task Q.
Paper not yet in RePEc: Add citation now
- Cheng, W., Huhn, J. C., and Hüllermeier, E. Decision tree and instance-based learning for label ranking. In Danyluk, A. P., Bottou, L., and Littman, M. L. (eds.), Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, volume 382 of ACM International Conference Proceeding Series, pp. 161–168. ACM, 2009.
Paper not yet in RePEc: Add citation now
- Cohen, W., Schapire, R., and Singer, Y. Learning to order things. Journal of Artificial Intelligence Research, 10(1): 243–270, 1999. Debreu, G. Review of Individual Choice Behavior by R.
Paper not yet in RePEc: Add citation now
- Comparison approaches In order to compare our proposed neural network based choice models FATE-NET 3 https://github.com/kiudee/cs-ranking and FETA-NET to an independent latent scoring model, we adapt the ranking algorithm RANKNET which was proposed for solving the task of object ranking using the underlying pairwise preferences (Burges et al., 2005; Tesauro, 1989).
Paper not yet in RePEc: Add citation now
- Dogan, Ü., Glasmachers, T., and Igel, C. A unified view on multi-class support vector classification. Journal of Machine Learning Research, 17(45):1–32, 2016. Duchi, J., Hazan, E., and Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization.
Paper not yet in RePEc: Add citation now
Evgeniou, T., Boussios, C., and Zacharia, G. Generalized robust conjoint estimation. Marketing Science, 24(3): 415–429, August 2005. Fürnkranz, J. and Hüllermeier, E. (eds.). Preference Learning.
- Figure 4b shows the attraction effect. In this case, another asymmetrically dominant object C is added to the existing set of objects {A, B}, where B slightly dominates a, then the relative utility share for object B increases in regards with A. The primary psychological reason is that consumers prefer the dominating products out of a set (Huber & Puto, 1983). Overall the consumer choice might change from A to B on adding another alternative to the set.
Paper not yet in RePEc: Add citation now
- For the choice setting the metric is calculated by comparing the ground-truth choice set c(Q) in binary vector form y for the given choice task Q = {x1, . . . , xn}, with predicted choice set ĉ(Q) in binary vector form ŷ and the metrics are defined in form d(y, ŷ) (|Q|= |y|= n). To define the metrics further we have to define the four quantities which are similar to those used to define the confusion matrix in case of binary classification i.e., true positives, true negatives, false positives, and false negatives (Koyejo et al., 2015). Formally they are defined as: d TP(y, ŷ) = 1 n n X i=1 Jyi = 1, ŷi = 1K d TN(y, ŷ) = 1 n n X i=1 Jyi = 0, ŷi = 0K d FP(y, ŷ) = 1 n n X i=1 Jyi = 1, ŷi = 0K d FN(y, ŷ) = 1 n n
Paper not yet in RePEc: Add citation now
- Fürnkranz, J., Hüllermeier, E., and Vanderlooy, S. Binary decomposition methods for multipartite ranking. In Buntine, W. L., Grobelnik, M., Mladenic, D., and ShaweTaylor, J. (eds.), Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I, volume 5781 of Lecture Notes in Computer Science, pp. 359–374. Springer, 2009.
Paper not yet in RePEc: Add citation now
- Geilen, M., Basten, T., Theelen, B., and Otten, R. An algebra of pareto points. Fundamenta Informaticae, 78 (1):35–74, 2007.
Paper not yet in RePEc: Add citation now
- Goodfellow, I., Bengio, Y., and Courville, A. Deep Learning. MIT Press, 2016. http://www. deeplearningbook.org.
Paper not yet in RePEc: Add citation now
- Har-Peled, S., Roth, D., and Zimak, D. Constraint classification: A new approach to multiclass classification. In Cesa-Bianchi, N., Numao, M., and Reischuk, R. (eds.), Algorithmic Learning Theory, 13th International Conference, ALT 2002, Lübeck, Germany, November 24-26, 2002, Proceedings, volume 2533 of Lecture Notes in Computer Science, pp. 365–379. Springer, 2002. Head, T., MechCoder, Louppe, G., Shcherbatyi, I., fcharras, VinıÌÂcius, Z., cmmalone, Schröder, C., nel215, Campos, N., Young, T., Cereda, S., Fan, T., Schwabedal, J., HvassLabs, Pak, M., SoManyUsernamesTaken, Callaway, F., Estève, L., Besson, L., Landwehr, P. M., Komarov, P., Cherti, M., Shi, K. K., Pfannschmidt, K., Linzberger, F., Cauet, C., Gut, A., Mueller, A., and Fabisch, A.
Paper not yet in RePEc: Add citation now
Huber, J. Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. The Journal of Consumer Research, 9(1):90–98, 1982. Huber, J. and Puto, C. Market boundaries and product choice: Illustrating attraction and substitution effects.
- Hyperparameters & Inference For all neural network models, we make use of the following techniques: • We use either rectified linear units (ReLU) nonlinearities + batch normalization (BN) (Ioffe & Szegedy, 2015) or self-normalizing linear units (SeLU) non-linearities (Klambauer et al., 2017) for each hidden layer. • Regularization: L2 penalties are applied and the corresponding regularization strength is tuned.
Paper not yet in RePEc: Add citation now
- In Bach, F. R. and Blei, D. M. (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp. 448– 456. JMLR.org, 2015. Kamishima, T., Kazawa, H., and Akaho, S. A survey and empirical comparison of object ranking methods.
Paper not yet in RePEc: Add citation now
- In Fürnkranz, J. and Hüllermeier, E. (eds.), Preference Learning, pp. 181–202. Springer-Verlag, Berlin, Heidelberg, 2010.
Paper not yet in RePEc: Add citation now
- In Lee, D. D., Sugiyama, M., von Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pp. 3198–3206, 2016.
Paper not yet in RePEc: Add citation now
- Journal of Consumer Psychology, 9(4):189–200, 2000. Diamond, J. and Evans, W. The correction for guessing.
Paper not yet in RePEc: Add citation now
- Journal of Consumer Research, 10(1):31–44, 1983. Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift.
Paper not yet in RePEc: Add citation now
- Journal of Machine Learning Technologies, 2(1):37–63, 2011. Ragain, S. and Ugander, J. Pairwise choice markov chains.
Paper not yet in RePEc: Add citation now
- Klambauer, G., Unterthiner, T., Mayr, A., and Hochreiter, S. Self-normalizing neural networks. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30, pp. 972–981. Curran Associates, Inc., 2017.
Paper not yet in RePEc: Add citation now
- Koyejo, O., Natarajan, N., Ravikumar, P., and Dhillon, I. S. Consistent multilabel classification. In NIPS, pp. 3321– 3329, 2015.
Paper not yet in RePEc: Add citation now
- Learning Choice Functions • Optimizer: stochastic gradient descent (SGD) with Nesterov momentum (Nesterov, 1983).
Paper not yet in RePEc: Add citation now
Learning Choice Functions Grabisch, M., Marichal, J., Mesiar, R., and Pap, E. Aggregation Functions. Cambridge University Press, 2009.
- LeCun, Y. and Cortes, C. MNIST handwritten digit database. 2010. URL http://yann.lecun.com/ exdb/mnist/.
Paper not yet in RePEc: Add citation now
- Lewis, D. D. Evaluating and optimizing autonomous text classification systems. In SIGIR, pp. 246–254. ACM Press, 1995.
Paper not yet in RePEc: Add citation now
- Luce, R. D. Individual Choice Behavior: A Theoretical Analysis. John Wiley and Sons, 1959.
Paper not yet in RePEc: Add citation now
- Luce. The American Economic Review, 50(1):186–188, 1960. Dhar, R., Nowlis, S. M., and Sherman, S. J. Trying hard or hardly trying: An analysis of context effects in choice.
Paper not yet in RePEc: Add citation now
Maldonado, S., Montoya, R., and Weber, R. Advanced conjoint analysis using feature selection via support vector machines. European Journal of Operational Research, 241(2):564 – 574, 2015.
- Moore, R. and DeNero, J. L1 and l2 regularization for multiclass hinge loss models. In MLSLP, 2011. Murphy, K. P. Machine Learning: A Probabilistic Perspective.
Paper not yet in RePEc: Add citation now
- Nesterov, Y. A method of solving a convex programming problem with convergence rate o (1/k2). In Soviet Mathematics Doklady, volume 27, pp. 372–376, 1983.
Paper not yet in RePEc: Add citation now
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. Pfannschmidt, K., Gupta, P., and Hüllermeier, E. Deep architectures for learning context-dependent ranking functions.
Paper not yet in RePEc: Add citation now
- Psychological review, 79(4):281, 1972. Tversky, A. and Simonson, I. Context-dependent preferences.
Paper not yet in RePEc: Add citation now
- Ravanbakhsh, S., Schneider, J., and PoÃŒÂczos, B. Equivariance through parameter-sharing. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 2892–2901, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. Learning Choice Functions Rigutini, L., Papini, T., Maggini, M., and Scarselli, F. Sortnet: Learning to rank by a neural preference function.
Paper not yet in RePEc: Add citation now
- Rooderkerk, R. P., Van Heerde, H. J., and Bijmolt, T. H. Incorporating context effects into a choice model. Journal of Marketing Research, 48(4):767–780, 2011.
Paper not yet in RePEc: Add citation now
- Salvatier, J., Wiecki, T. V., and Fonnesbeck, C. Probabilistic programming in python using PyMC3. PeerJ Computer Science, 2:e55, apr 2016.
Paper not yet in RePEc: Add citation now
- Samuelson, P. A. A note on the pure theory of consumer’s behaviour. Economica, 5(17):61–71, 1938.
Paper not yet in RePEc: Add citation now
- scikit-optimize/scikit-optimize: High five - v0.5, February 2018. Houthakker, H. S. Revealed preference and the utility function.
Paper not yet in RePEc: Add citation now
Sen, A. K. Choice functions and revealed preference. The Review of Economic Studies, 38(3):307–317, 1971.
- Simonson, I. and Tversky, A. Choice in context: Tradeoff contrast and extremeness aversion. Journal of Marketing Research, 29(3):281–295, 1992. Tesauro, G. Connectionist learning of expert preferences by comparison training. In Touretzky, D. S. (ed.), Advances in Neural Information Processing Systems 1, pp. 99–106.
Paper not yet in RePEc: Add citation now
Simonson, I. Choice based on reasons: The case of attraction and compromise effects. Journal of consumer research, 16(2):158–174, 1989.
- The important difference between the multi-label classification and the choice function setting is that there are no fixed labels. That is why we can only use micro-averaging to compute the F1-measure across different objects and instances (Koyejo et al., 2015).
Paper not yet in RePEc: Add citation now
- The most common GEV models which are used for conjoint analysis studies in the field of market research are the NESTEDLOGIT and GENNESTEDLOGIT, which account for the similarity context-effect (Ben-Akiva et al., 1985; Tversky, 1972). These models allocate the objects in the given choice task Q, into different sets called nests and learn correlations between the objects inside each nest (B = {B1, . . . BK}) (Wen & Koppelman, 2001; Train, 2009). The GENNESTEDLOGIT is the most general model of this class, which allows the fractional allocation of each object in Q to each nest and learns the correlation between them (Wen & Koppelman, 2001). Another model which was proposed for solving the task of discrete choice is the PAIRWISESVM. It makes use of the underlying pairwise preferences to fit a linear model.
Paper not yet in RePEc: Add citation now
- The similarity effect is another phenomenon according to which the presence of one or more similar objects reduces their overall probability of getting chosen, as it divides the loyalty of potential consumers (Huber & Puto, 1983). In Figure 4c, B and C are two similar objects. Consumers who prefer high quality will be divided amongst the two objects resulting in a decrease of the relative utility share of object B. While in the original set, the choice of these customers will always be B, while on adding another object C similar to B can change the overall choice to A. îˆality Price B A C (a) Compromise îˆality Price B A C (b) Attraction îˆality Price B A C
Paper not yet in RePEc: Add citation now
- The step-decay function drops the learning rate by a factor after a few epochs (Duchi et al., 2011). The intuition behind this function is that to traverse to proper parameters and then reduce the learning rate to narrower parts of the loss function. Formally it is defined as: lr = lr0 ∗ d e edrop r , where lr0 is the initial learning rate, 0 < dr < 1 is the rate with which the learning rate should be reduced, e is the current epoch and edrop is the number of epochs after which the learning rate is decreased.
Paper not yet in RePEc: Add citation now
- Top-k Categorical Accuracy The top-k categorical accuracy is defined as the fraction of times in which the set of objects in the top k positions, according to the predicted scores, contains the ground-truth chosen object (Chollet et al., 2017; Ben-Akiva et al., 1985). Let r↓:= arg sorti∈|Q| si denote the indexes of the score vector s when sorted in decreasing order. Then the top-k categorical accuracy is defined as dtopK(c(Q), s) = s c(Q) ⊂ k [ i=1 xr↓i { .
Paper not yet in RePEc: Add citation now
- TP + d FN Precision Precision denotes the proportion of predicted positive labels that are correct (Powers, 2011). For the choice setting this can be defined as the fraction of objects from the predicted choice set ĉ(Q) that are actually chosen by the decision maker or that are present in the ground-truth choice set c(Q). Formally it is defined as: dPR = d TP d
Paper not yet in RePEc: Add citation now
- TP + d FP F1-measure The traditional F1-measure is defined as the harmonic mean of precision and recall: dF1 (y, ŷ) = 2 dPR dRE dPR + dRE We can also define in form of the confusion matrix quantities as follows (Koyejo et al., 2015): dF1 (y, ŷ) = 2d TP 2d TP + d FN + d FP Learning Choice Functions A.5. Discrete Choice Function Metrics We evaluate the DCMs based on top-k categorical accuracy, while the models are compared on discrete choice tasks with different sizes based on the normalized accuracy. In discrete choice setting the metric is calculated by comparing the ground-truth choice set/discrete choice c(Q) for the given discrete choice task Q = {x1, . . . , xn}, with vector s = (s1, . . . , sn) of predicted scores for each object in Q and the metrics are defined in form d(c(Q), s).
Paper not yet in RePEc: Add citation now
Train, K. E. Discrete choice methods with simulation. Cambridge university press, 2009. Tversky, A. Elimination by aspects: A theory of choice.
- Vembu, S. and Gärtner, T. Label ranking algorithms: A survey. In Fürnkranz & Hüllermeier (2010), pp. 45–64.
Paper not yet in RePEc: Add citation now
- Waegeman, W., Dembczynski, K., Jachnik, A., Cheng, W., and Hüllermeier, E. On the bayes-optimality of fmeasure maximizers. Journal of Machine Learning Research, 15(1):3333–3388, 2014.
Paper not yet in RePEc: Add citation now
- We adapt it here by applying our threshold tuning to solve the general choice functions task (Evgeniou et al., 2005; Maldonado et al., 2015).
Paper not yet in RePEc: Add citation now
Wen, C.-H. and Koppelman, F. S. The generalized nested logit model. Transportation Research Part B: Methodological, 35(7):627–641, 2001.
- Weston, J., Watkins, C., et al. Support vector machines for multi-class pattern recognition. In Esann, volume 99, pp. 219–224, 1999. Ye, N., Chai, K. M. A., Lee, W. S., and Chieu, H. L. Optimizing f-measure: A tale of two approaches. In ICML.
Paper not yet in RePEc: Add citation now
- X i=1 Jyi = 0, ŷi = 1K Subset 0/1 Accuracy Subset 0/1 accuracy measures the number of times the ground-truth choice set c(Q) and the predicted choice set ĉ(Q) are exactly the same. This metric is used to measure how often the algorithms predictions match the complete choice set. Formally it is defined as: dSUBSET = Jy = ŷK Recall Recall is defined as the proportion of Real Positive cases that are correctly Predicted Positive (Powers, 2011). In the field of information retrieval, it is the fraction of the relevant documents that are successfully retrieved. For choice setting this can be defined as the fraction of objects from the ground-truth choice set c(Q) which chosen successfully or are present in the predicted choice set ĉ(Q). Formally it is defined as: dRE = d TP d
Paper not yet in RePEc: Add citation now
- X i=1 yi log si , The loss increases as the predicted scores si diverges for the chosen object yi = 1, yi ∈ y (Murphy, 2012). So, predicting a score of 0.012 for the chosen object i ∈ I would result in a high value for loss, and a perfect model would have a log loss of 0 as shown in Figure 5.
Paper not yet in RePEc: Add citation now
- Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., and Smola, A. J. Deep sets. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30, pp. 3393–3403. Curran Associates, Inc., 2017.
Paper not yet in RePEc: Add citation now