Abstract
In this paper, we propose a method for quickly finding a given number of instances of a target class from a fixed data set. We assume that we have a noisy query consisting of both useful and useless features (e.g., keywords). Our method finds target instances and trains a classifier simultaneously in a greedy strategy: it selects an instance most likely to be of the target class, manually label it, and add it to the training set to retrain the classifier, which is used for selecting the next item. In order to quickly inactivate useless query features, our method compares discriminative power of features, and if a feature is inferior to any other feature, the weight 0 is assigned to the inferior one. The weight is 1 otherwise. The greedy strategy explained above has a problem of bias: the classifier is biased toward target instances found earlier, and deteriorates after running out of similar target instances. To avoid it, when we run out of items that have the superior features, we re-activate the inactivated inferior features. By this mechanism, our method adaptively shifts to new regions in the data space. Our experiment shows that our binary and adaptive feature weighting method outperforms existing methods.
This work was supported by JSPS KAKENHI Grant Number 22H00508, 23H03405, and JST CREST Grant Number JPMJCR22M2, Japan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Beluch, W.H., Genewein, T., Nürnberger, A., Köhler, J.M.: The power of ensembles for active learning in image classification. In: Proceedings of CVPR, pp. 9368–9377 (2018)
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Cheng, J., Bernstein, M.S.: Flock: hybrid crowd-machine learning classifiers. In: Proceedings of CSCW, pp. 600–611 (2015)
Durand, A., Gagné, C.: Thompson sampling for combinatorial bandits and its application to online feature selection. In: Proceedings of AAAI Conference Workshop on Sequential Decision-Making with Big Data, pp. 6–9 (2014)
Gal, Y., Islam, R., Ghahramani, Z.: Deep Bayesian active learning with image data. In: Proceedings of ICML, pp. 1183–1192. PMLR (2017)
Ganguly, D., Leveling, J., Magdy, W., Jones, G.J.: Patent query reduction using pseudo relevance feedback. In: Proceedings of CIKM, pp. 1953–1956 (2011)
Gupta, M., Bendersky, M.: Information retrieval with verbose queries. In: Proceedings of SIGIR, pp. 1121–1124 (2015)
Hall, M.A., Smith, L.A.: Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Proceedings of International Florida Artificial Intelligence Research Society Conference, pp. 235–239. AAAI Press (1999)
Huang, P., Bu, J., Chen, C., Qiu, G.: An effective feature-weighting model for question classification. In: Proceedings of International Conference on Computational Intelligence and Security, pp. 32–36 (2007)
Jörger, P., Baba, Y., Kashima, H.: Learning to enumerate. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9886, pp. 453–460. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44778-0_53
Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: Proceedings of Science and Information Conference, pp. 372–378 (2014)
Konyushkova, K., Sznitman, R., Fua, P.: Learning active learning from data. arXiv preprint arXiv:1703.03365 (2017)
Koopman, B., Cripwell, L., Zuccon, G.: Generating clinical queries from patient narratives: a comparison between machines and humans. In: Proceedings of SIGIR, pp. 853–856 (2017)
Kou, G., Yang, P., Peng, Y., Xiao, F., Chen, Y., Alsaadi, F.E.: Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl. Soft Comput. 86, 105836 (2020)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Lan, M., Tan, C.L., Low, H.: Proposing a new term weighting scheme for text categorization. In: Proceedings of National Conference on Artificial Intelligence, pp. 763–768 (2006)
Qi, Y., Zhang, J., Liu, Y., Xu, W., Guo, J.: CGTR: convolution graph topology representation for document ranking. In: Proceedings of CIKM, pp. 2173–2176 (2020)
Reddy, G.T., et al.: Analysis of dimensionality reduction techniques on big data. IEEE Access 8, 54776–54788 (2020)
Remeseiro, B., Bolon-Canedo, V.: A review of feature selection methods in medical applications. Comput. Biol. Med. 112, 103375 (2019)
Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M.: Filter methods for feature selection: a comparative study. In: Proceedings of IDEAL, pp. 178–187 (2007)
Settles, B.: Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances. In: Proceedings of EMNLP, pp. 1467–1478. ACL (2011)
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
Sun, X., Tang, H., Zhang, F., Cui, Y., Jin, B., Wang, Z.: Table: a task-adaptive BERT-based listwise ranking model for document retrieval. In: Proceedings of CIKM, pp. 2233–2236 (2020)
Takahama, R., Baba, Y., Shimizu, N., Fujita, S., Kashima, H.: AdaFlock: adaptive feature discovery for human-in-the-loop predictive modeling. In: Proceedings of AAAI Conference, pp. 1619–1626 (2018)
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10(66–71), 13 (2009)
Weinberger, K.Q., Sha, F., Saul, L.K.: Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of ICML, p. 106 (2004)
Yahoo! crowdsourcing. http://crowdsourcing.yahoo.co.jp/
Yahoo! news. http://news.yahoo.co.jp/
Yu, H., Yang, X., Zheng, S., Sun, C.: Active learning from imbalanced data: a solution of online weighted extreme learning machine. IEEE Trans. Neural Netw. Learn. Syst. 30(4), 1088–1103 (2018)
Zhang, J., Geng, Y.A.O., Li, Q., Shi, C.: More than one: a cluster-prototype matching framework for zero-shot learning. In: Proceedings of CIKM, pp. 1803–1812 (2020)
Zou, J.Y., Chaudhuri, K., Kalai, A.T.: Crowdsourcing feature discovery via adaptively chosen comparisons. In: Proceedings of AAAI HComp, pp. 198–205 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Horikawa, S., Nemoto, C., Tajima, K., Matsubara, M., Morishima, A. (2024). An Adaptive Feature Selection Method for Learning-to-Enumerate Problem. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-56063-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56062-0
Online ISBN: 978-3-031-56063-7
eBook Packages: Computer ScienceComputer Science (R0)