Abstract
This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (theinductive policy). We analyze existing “bias selection” systems, examining the similarities and differences in their inductive policies, and identify three techniques useful for building inductive policies. We then present a framework for representing and automatically selecting a wide variety of biases and describe experiments with an instantiation of the framework addressing various pragmatic tradeoffs of time, space, accuracy, and the cost of errors. The experiments show that a common framework can be used to implement policies for a variety of different types of bias selection, such as parameter selection, term selection, and example selection, using similar techniques. The experiments also show that different tradeoffs can be made by the implementation of different policies; for example, from the same data different rule sets can be learned based on different tradeoffs of accuracy versus the cost of erroneous predictions.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Almuallim, H., & Dietterich, T. (1991). Learning with many irrelevant features.Proceedings of the Ninth National Conference on Artificial Intelligence (pp. 547–552), Menlo Park, CA: AAAI Press.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984).Classification and regression trees, Belmont, CA: Wadsworth International Group.
Brodley, C.E. (1993). Addressing the selective superiority problem: automatic algorithm/model class selection.Proceedings of the Tenth International Conference on Machine Learning (pp. 17–24), San Mateo, CA: Morgan Kaufmann.
Brodley, C.E. (this volume). Recursive automatic bias selection for classifier construction.
Buchanan, B., Johnson, C., Mitchell, T., & Smith, R. (1978). Models of learning systems, in J. Beltzer (Ed.),Encyclopedia of computer science and technology (pp. 24–51).
Buntine, W. (1991).A theory of learning classification rules, Ph.D. Thesis, University of Technology, Sidney.
Catlett, J. (1992). Peepholing: choosing attributes efficiently for megainduction, in D. Sleeman & P. Edwards (Eds.),Proceedings of the Ninth International Conference on Machine Learning (pp. 49–54), San Mateo, CA: Morgan Kaufmann.
Chan, P., & Stolfo, S. (1993). Toward parallel and distributed learning by meta-learning.Proceedings of the AAAI-93 Workshop on Knowledge Discovery in Databases.
Clearwater, S., Cheng, T., Hirsh, H., & Buchanan, B. (1989). Incremental batch learning.Proceedings of the Sixth International Workshop on Machine Learning (pp. 366–370), San Mateo, CA: Morgan Kaufmann.
Clearwater, S., & Provost, F. (1990). RL4: a tool for knowledge-based induction.Proceedings of the Second International IEEE Conference on Tools for Artificial Intelligence (pp. 24–30), Los Alamitos, CA: IEEE C.S. Press.
Cohen, D., Kulikowski, C., & Berman, H. (1993). Knowledge-based generation of machine learning experiments: learning with DNA crystallography data.Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, Menlo Park, CA: AAAI Press.
Danyluk, A.P., & Provost, F.J. (1993). Small disjuncts in action: learning to diagnose errors in the telephone network local loop.Proceedings of the Tenth International Conference on Machine Learning (ML-93) (pp. 81–88), San Mateo, CA: Morgan Kaufmann.
DeJong, G., & Mooney, R. (1986). Explanation-based learning: an alternative view,Machine Learning, 1:287–316.
desJardins, M. (1991). Probabilistic evaluation of bias for learning systems.Proceedings of the Eighth International Workshop on Machine Learning (pp. 495–499). San Mateo, CA: Morgan Kaufmann.
desJardins, M. (1992).PAGODA: a model for autonomous learning in probabilistic domains, Ph.D. Thesis, University of California at Berkeley (available as Technical Report UCB/CSD 92/678).
Devijver, P.A., & Kittler, J. (1982).Pattern recognition: a statistical approach, Prentice Hall.
Dietterich, T. (1991). Machine learning: issues, answers, and quandaries, Keynote Lecture, AAAI-91.
Dietterich, T., & Michalski, R. (1983). A comparative review of selected methods for learning from examples, in R. Michalski, J. Carbonell, & T. Mitchell (Eds.),Machine learning: an artificial intelligence approach (pp. 41–81), Palo Alto, CA: Tioga.
Gordon, D. (1990).Active bias adjustment for incremental, supervised concept learning, Ph.D. Thesis, Department of Computer Science, University of Maryland.
Grosof, B., & Russell, S. (1990). Shift of bias as non-monotonic reasoning, in Brazdil & Konolige (Eds.),Machine Learning, meta-reasoning, and logics, Boston, MA: Kluwer Academic Publishers.
Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework,Artificial Intelligence, 36:177–221.
Holder, L. (1990). The general utility problem in machine learning,Proceedings of the Seventh International Conference on Machine Learning (pp. 402–410). San Mateo, CA: Morgan Kaufmann.
Holte, R.C. (1993). Very simple classification rules perform well on most commonly used datasets.Machine Learning, 11(1):63–90.
Kira, K., & Rendell, L. (1992). The feature selection problem: traditional methods and a new algorithm,Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 129–134), Menlo Park, CA: AAAI Press.
Korf, R.E. (1985). Depth-first iterative deepening: an optimal admissible tree search,Artificial Intelligence, 27:97–109.
Lifschitz, V. (1987). Circumscriptive theories: a logic-based framework for knowledge.Proceedings of the Sixth National Conference on Artificial Intelligence (pp. 364–368), San Mateo, CA: Morgan Kaufmann.
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm,Machine Learning, 2(4):284–318.
Matheus, C. (1989).Feature construction: an analytic framework and an application to decision trees, Ph.D. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign.
McCarthy, J. (1980). Circumscription—a form of non-monotonic reasoning,Artificial Intelligence, 13:27–39.
Michalski, R., & Mozetic, I., Hong, J., & Lavrac, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains.Proceedings of the Fifth National Conference on Artifical Intelligence (pp. 1041–1045), Menlo Park, CA: AAAI Press.
Michalski, R.S. (1993). Special issue on multistrategy learning,Machine Learning, 11(2/3).
Mitchell, T. (1978).Version spaces: an approach to concept learning, Ph.D. Thesis, Department of Electrical Engineering, Stanford University.
Mitchell, T. (1980).The need for biases in learning generalizations (Technical Report CBM-TR-117), Department of Computer Science, Rutgers University.
Mitchell, T., Keller, R., & Kedar-Cabelli, S. (1986). Explanation-based generalization: a unifying view,Machine Learning, 1:47–80.
Mitchell, T., Utgoff, P., & Banerji, R. (1983). Learning by experimentation: acquiring and refining problem-sloving heuristics, in R. Michalski, J. Carbonell, & T. Mitchell (Eds.),Machine learning: an artificial intelligence approach (pp. 163–190), Palo Alto, CA: Tioga.
Musick, R., Catlett, J., & Russell, S. (1993). Decision theoretic subsampling for induction on large databases.Proceedings of the Tenth International Conference on Machine Learning (pp. 212–219), San Mateo, CA: Morgan Kaufmann.
Pagallo, G. (1989). Learning DNF by decision trees.Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 639–644), San Mateo, CA: Morgan Kaufmann.
Provost, F.J. (1992a). ClimBS: searching the bias space.Proceedings of the Fourth International Conference on Tools with Artificial Intelligence (pp. 146–153), Los Alamitos, CA: IEEE Computer Society Press.
Provost, F.J. (1992b).Policies for the selection of bias in inductive machine learning, Ph.D. Thesis, Computer Science Department, University of Pittsburgh.
Provost, F.J. (1993). Iterative weakening: optimal and near-optimal policies for the selection of search bias.Proceedings of the Eleventh National Conference on Artificial Intelligence (pp. 749–755), Menlo Park, CA: AAAI Press.
Provost, F.J. (1994). Goal-directed inductive learning: trading off accuracy for reduced error cost.Proceedings of the AAAI Spring Symposium on Goal-Directed Learning.
Provost, F.J., & Aronis, J. (in press).Scaling up inductive learning with massive parallelism, Machine Learning.
Provost, F.J., & Hennessy, D. (1994). Distributed machine learning: scaling up with coarse-grained parallelism.Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA: AAAI Press.
Provost, F., & Buchanan, B. (1992a). Inductive policy.Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 255–261), Menlo Park, CA: AAAI Press.
Provost, F., & Buchanan, B. (1992b). Inductive strengthening: the effects of a simple heuristic for restricting hypothesis space search, in K.P. Jantke (Ed.),Analogical and inductive inference (Proceedings of the 3rd International Workshop on Analogical and Inductive Inference), New York, NY: Springer-Verlag.
Provost, F.J., Buchanan, B.G., Clearwater, S.H., & Lee, Y. (1993).Machine learning in the service of exploratory science and engineering: a case study of the RL induction program (Technical Report ISL-93-6), Intelligent Systems Laboratory, University of Pittsburgh.
Quinlan, J. (1986a). Induction of decision trees,Machine Learning, 1:81–106.
Quinlan, J. (1986b). Inductive knowledge acquisition: a case study, in J. Quinlan (Ed.),Applications of expert systems (pp. 157–173).
Quinlan, J. (1987). Generating production rules from decision trees.Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 304–307), San Mateo, CA: Morgan Kaufmann.
Quinlan, R. (1993). Presentation of new results on selecting from multiple learning algorithms at the Tenth International Conference on Machine Learning.
Rabinowitz, H., Flamholz, J., Wolin, E., & Euchner, J. (1991). NYNEX MAX: a telephone trouble screening expert.Innovative Applications of Artificial Intelligence, 3 (pp. 213–230), Menlo Park, CA: AAAI Press.
Rendell, L. (1986). A general framework for induction and a study of selective induction,Machine Learning, 1:177–226.
Rendell, L., Seshu, R., & Tcheng, D. (1987). Layered concept-learning and dynamically-variable bias management.Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 308–314), San Mateo, CA: Morgan Kaufmann.
Riddle, P. (1989).Automating shifts of representation, Ph.D. Thesis, Department of Computer Science, Rutgers University.
Ruff, R. (1990).An empirical study into learning through experimentation, Ph.D. Thesis, Department of Computer Science, Oregon State University.
Russell, S. (1986). Preliminary steps toward the automation of induction.Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 477–484), San Mateo, CA: Morgan Kaufmann.
Russell, S., & Grosof, B. (1990). Declarative bias: an overview, in D.P. Benjamin (Ed.),Change of representation and inductive bias (pp. 267–308), Boston, MA: Kluwer Academic Publishers.
Rymon, R. (1993). An SE-tree based characterization of the induction problem.Proceedings of the Tenth International Conference on Machine Learning (pp. 268–275), San Mateo, CA: Morgan Kaufmann.
Samuel, A. (1963). Some studies in machine learning using the game of checkers, in E. Feigenbaum & J. Feldman (Eds.),Computers and Thought (pp. 71–105), New York, NY: McGraw-Hill.
Schaffer, C. (1993). Selecting a classification method by cross-validation,Machine Learning, 13(1):135–143.
Schlimmer, J. (1987).Concept acquisition through representational adjustment, Ph.D. Thesis, Department of Information and Computer Science, University of California at Irvine.
Simon, H.A., & Kadane, J.B. (1975). Optimal problem-solving search: all-or-none solutions,Artificial Intelligence, 6:235–247.
Simon, H.A., & Lea, G. (1974). Problem solving and rule induction: a unified view, in L.W. Gregg (Ed.),Knowledge and cognition (pp. 105–128), Lawrence Erlbaum Associates.
Slagle, J.R. (1964). An efficient algorithm for finding certain minimum-cost procedures for making binary decisions,Journal of the ACM, 11(3):253–264.
Spears, W., & Gordon, D. (1991). Adaptive strategy selection for concept learning.Proceedings of the First International Workshop on Multistrategy Learning (pp. 231–246) Fairfax, VA: George Mason University.
Subramanian, D. (1989).Toward a theory of justified reformulations, Ph.D. Thesis, Stanford University.
Tcheng, D., Lambert, B., Lu, S., & Rendell, L. (1989). Building robust learning systems by combining induction and optimization.Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 806–812), San Mateo, CA: Morgan Kaufmann.
Uhr, L., & Vossler, C. (1963). A pattern-recognition program that generates, evaluates, and adjusts its own operators, in E. Feigenbaum & J. Feldman (Eds.),Computers and thought (pp. 251–268), New York, NY: McGraw-Hill.
Utgoff, P. (1984).Shift of bias for inductive concept learning, Ph.D. Thesis, Rutgers University.
Utgoff, P. (1986). Shift of bias for inductive concept learning, in R. Michalski, J. Carbonell, & T. Mitchell (Eds.),Machine learning, an artificial intelligence approach (pp. 107–148), San Mateo, CA: Morgan Kaufmann.
Utgoff, P.E. (1989). Incremental induction of decision trees,Machine Learning, 4(2):161–186.
Weiss, S., & Indurkhya, N. (1991). Reduced complexity rule induction.Proceedings of IJCAI-91 (pp. 678–684).
Wirth, J., & Catlett, J. (1988). Experiments on the costs and benefits of windowing in ID3.Proceedings of the Fifth International Conference on Machine Learning (pp. 87–99), San Mateo, CA: Morgan Kaufmann.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Provost, F.J., Buchanan, B.G. Inductive policy: The pragmatics of bias selection. Mach Learn 20, 35–61 (1995). https://doi.org/10.1007/BF00993474
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF00993474