数据要素产业
机器学习(英文版)[按需印刷]
内容简介
机器学习这门学科研究的是能通过经验自动改进的计算机算法,其应用从数据挖掘程序到信息过滤系统,再到自动机工具,已经非常丰宣。机器学习从很多学科吸收了成果和概念,包括人工智能、概率论与数理统计、哲学、信息论、生物学、认知科学和控制论等,并以此来理解问题的背景、算法和算法中的隐含假定。 本书展示了机器学习中的核心算法和理论,并阐明了算法的运行过程。书中主要涵盖了目前机器学习中各种最实用的理论和算法,包括概念学习、决策树、神经网络、贝叶斯学习、基于实例的学习、遗传算法、规则学习、基于解释的学习和增强学习等。对每一个主题,作者不仅进行了十分详尽和直观的解释,还给出了实用的算法流程。本书被卡内基梅隆等许多大学作为机器学习课程的教材。
作译者
Tom M.Mitchell是卡内基梅隆大学教授,目前担任该校自动学习和发现中心主任。他还是美国人工智能协会(AAAI)的主席,并且是《Machine Learning》杂志和国际机器学习会议(ICML)的创办者。.. <<
目录
prefaceacknowledgments1 introduction1.1 well-posed learning problems1.2 designing a learning system1.2.1 choosing the training experience1.2.2 choosing the target function1.2.3 choosing a representation for the target function1.2.4 choosing a function approximation algorithm1.2.5 the final design1.3 perspectives and issues in machine learning1.3.1 issues in machine learning1.4 how to read this book1.5 summary and further readingexercisesreferences2 concept learning and the general-to-specific ordering2.1 introduction2.2 a concept learning task2.2.1 notation
.2.2.2 the inductive learning hypothesis2.3 concept learning as search2.3.1 general-to-specific ordering of hypotheses2.4 find-s: finding a maximally specific hypothesis2.5 version spaces and the candidate-eliminationalgorithm2.5.1 representation2.5.2 the list-then-eliminate algorithm2.5.3 a more compact representation for version spaces2.5.4 candidate-elimination lsarning algorithm2.5.5 ar illustrative example2.6 remarks on version spaces and canoidate-elimination2.6.1 will the candidate-elimination algorithmconverge to the correct hypothesis?2.6.2 what training example should the leamer requestnext?2.6.3 how can panially leamed concepts be used?2.7 inductive bias2.7.1 a biased hypothesis space2.7.2 an unbiased learner2.7.3 the futility of bias-free learning2.8 summary and further readingexcercisesreferences3 decision tree learning3.1 introduction3.2 decision tree representation3.3 appropriate problems for decision tree learning3.4 the basic decision tree leaming algorithm3.4.1 which attribute is the best classifier?3.4.2 an illustrative example3.5 hypothesis space search in decision tree leaming3.6 inductive bias in decision tree learning3.6.1 restriction biases and preference biases3.6.2 why prefer shon hypotheses?3.7 issues in decision tree lsarning3.7.1 avoiding overfitting the data3.7.2 incorporating continuous-valued attributes3.7.3 alternative measures for selecting attributes3.7.4 handling training examples witll missing attributevalues3.7.5 handling attributes with differing costs3.8 summary and further readingexercisesreferences4 artificial neural networks4.1 introduction4.1.1 biological motivation4.2 neural network representations4.3 appropriate ptoblems for neural network learning4.4 perceptrons4.4.1 represenational power of perceptrons4.4.2 the perceptron training rule4.4.3 gradient descent and the delta rule4.4.4 remarks4.5 multilayer networks and the backpropaoation algorithm4.5.1 a differentiable threshold unit4.5.2 the backpropagauon algorithm4.5.3 derivation of the backpropagation rule4.6 remarks on the backpropagation algorithm4.6.1 convergence and local minima4.6.2 representational power of feedforward networks4.6.3 hypothesis space search and inductive bias4.6.4 hidden layer representations4.6.5 generalization, overfitting, and stopping criterion4.7 an illusuative example: face recognition4.7.1 the task4.7.2 design choices4.7.3 lsarned hidden representations4.8 advanced topics in artificial neural networks4.8.1 altemative error functions4.8.2 altemative error minimization procedures4.8.3 recument networks4.8.4 dynamically modifying network structure4.9 summary and further readingexercisesreferences5 evaluating hypotheses5.1 motivation5.2 estimating hypothesis accuracy5.2.1 sample error and true error5.2.2 confidence intervals for discrete-valued hypotheses5.3 basics of sampling theory5.3.1 error estimation and estimating binomial proportions5.3.2 the binomial distribution5.3.3 mean and variance5.3.4 estimators, bias, altd variance5.3.5 confidence intervals5.3.6 two-sided and one-sided bounds5.4 a general approach for deriving confidence intervals5.4.1 central limit theorem5.5 difference in error of two hypotheses5.5.1 hypothesis testing5.6 comparing learning algorithms5.6.1 paired t tests5.6.2 practical considerations5.7 summary and further readingexercisesreferences6 bayesian learning6.1 introduction6.2 bayes theorem6.2.1 an example6.3 bayes theorem and concept learning6.3.1 brute-force bayes concept learning6.3.2 map hypotheses and consistent lsarners6.4 maximum likelihood and least-squared error hypotheses6.5 maximum likelihood hypotheses for predicting probabilities6.5.1 gradient search to maximize likelihood in a neural net6.6 minimum description length principle6.7 bayes optimal classifier6.8 gibbs algorithm6.9 naive bayes classifier6.9.1 an illustrative example6.10 an example: learning to classify text6.10.1 experimental results6.11 bayesian belief networks6.11.1 conditional independence6.11.2 representation6.11.3 inference6.11.4 leaming bayesian belief networks6.11.5 gradient ascent training of bayesian networks6.11.6 leanling the suucture of bayesian networks6.12 the em algorithm6.12.1 estimating means of k gaussians6.12.2 oeneral statement of em algorithm6.12.3 derivation of the k means algorithm6.13 summary and further readingexercisesreferences7 computational leaming theory7.1 introduction7.2 probably learning an approximately correct hypothesis7.2.1 the problem setting7.2.2 error of a hypothesis7.2.3 pac leamability7.3 sample complexity for finite hypothesis spaces7.3.1 agnostic leaming and inconsistent hypotheses7.3.2 conjunctions of boolean literals are pac-learnable7.3.3 pac-learnability of other concept classes7.4 sample complexity for infinite hypothesis spaces7.4.1 shattering a set of instances7.4.2 the vapnik-chervonenkis dimension7.4.3 sample complexity and the vc dimension7.4.4 vc dimension for neural networks7.5 the mistake bound model of learning7.5.1 mistake bound for the find-s algorithm7.5.2 mistake bound for the halving algorithm7.5.3 optimal mistake bounds7.5.4 weighted-majority algorithm7.6 summary and further readingexercisesreferences8 instance-based learning8.1 introduction8.2 k-nearest neighbor learning8.2.1 distance-weighted nearest neighbor algorithm8.2.2 remarks on k-nearest neighbor algorithm8.2.3 a note on terminology8.3 locally weighted regression8.3.1 locally weighted linear regression8.3.2 remarks on locally weighted regression8.4 radial basis functions8.5 case-based reasoning8.6 remarks on lazy and eager learning8.7 summary anexercisesreferences9 genetic algorithms9.1 modvadon9.2 genetic algorithms9.2.1 representing hypotheses9.2.2 genetic operators9.2.3 fialess function and selection9.3 an illusuative example9.3.1 extensions9.4 hypothesis space search9.4.1 population evolution and the schema theorem9.5 oenetic programming9.5.1 representing programs9.5.2 illustrative example9.5.3 remarks on genetic programming9.6 models of evolution and learning9.6.1 lamarckian evolution9.6.2 baldwin effect9.7 parallelizing genetic algorithms9.8 summary and furaler readingexercisesreferences10 learning sets of rules10.1 introduction10.2 sequential covering algorithms10.2.1 general to specific beam search10.2.2 variations10.3 learning rule sets: summary10.4 learning first-order rules10.4.1 first-order horn clauses10.4.2 terminology10.5 learning sets of first-order rules: foil10.5.1 generating candidate specializations in foil10.5.2 guiding the search in foil10.5.3 learning recursive rule sets10.5.4 summary of foil10.e induction as invened deduction10.7 inverting resolution10.7.1 first-order resolution10.7.2 inverting resolution: first-order case10.7.3 summary of inverse resolution10.7.4 generalization, 0-subsumption, and entailment10.7.5 progol10.8 summary and further readingexercisesreferences11 analytical leaming11.1 introduction11.1.1 inductive and analytical leaming problems11.2 learning with perfect domain theories: prolog-ebg11.2.1 an illustrative trace11.3 remarks on explanation-based learning11.3.1 discovering new features11.3.2 deductive learning11.3.3 inductive bias in explanation-based learning11.3.4 knowledge level learning11.4 explanation-based learning of search control knowledge11.5 summary and further readingexercisesreferences12 combining inductive and analytical learning12.1 motivation12.2 inductive-analytical approaches to learning12.2.1 the learning problem12.2.2 hypothesis space search12.3 using prior knowledge to lnitialize the hypothesis12.3.1 the kbann algorithm12.3.2 an illustrative example12.3.3 remarks12.4 using prior knowledge to alter the search objective12.4.1 the tangentprop algorithm12.4.2 an illustrative example12.4.3 remarks12.4.4 the ebnn algorithm12.4.5 remarks12.5 using prior knowledge to augment searchoperators12.5.1 the focl algorithm12.5.2 remarks12.6 state of the art12.7 summary and further readingexercisesreferences13 reinforcement learning13.1 introduction13.2 the learning task13.3 q learning13.3.1 the q function13.3.2 an algorithm for learning q13.3.3 an illustrative example13.3.4 convergence13.3.5 experimentation strategies13.3.6 updating sequence13.4 nondeterministic rewards and actions13.5 temporal difference learning13.6 oeneralizing from examples13.7 relationship to dynamic pro13.8 summary and further readingexercisesreferencesappendix notationindexesauthor indexsubject index
前言
The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. In recent years many successful machine learning applications have been developed, ranging from data-mining programs that learn to detect fraudulent credit card transactions, to information-filtering systems that learn users' reading preferences, to autonomous vehicles that leant to drive on public highways. At the same time, there have been important advances in the theory and algorithms that form the foundations of this field. The goal of this textbook is to present the key algorithms and theory that form the core of machine learning. Machine learning draws on concepts and results from many fields, including statistics, artificial intelligence, philosophy,information theory, biology, cognitive science, computational complexity, and control theory. My belief is that the best way to learn about machine learning is to view it from all of these perspectives and to understand the problem settings,algorithms. and assumptions that underlie each. In the past, this has been difficult due to the absence of a broad-based single source introduction to the field. The primary goal of this book is to provide such an introduction. Because of the interdisciplinary nature of the material, this book makes few assumptions about the background of the reader. Instead, it introduces basic concepts from statistics, artificial intelligence, information theory, and other disciplines as ale need arises, focusing on just those concepts most relevant to machine learning. The book is intended for both undergraduate and graduate students in fields such as computer science, engineering, statistics, and the social sciences,and as a reference for software professionals and practitioners. Two principles that guided the writing of the book were that it should be accessible to undergraduate students and that it should contain the material I would want my own Ph.D.students to learn before beginning their doctoral research in machine learning. A third principle that guided the writing of this book was that it should present a balance of theory and practice. Machine learning theory attempts to answer questions such as "How does learning performance vary with the number of training examples presented?" and "Which learning algorithms are most appropriate for various types of learning tasks?" This book includes discussions of these and other theoretical issues, drawing on theoretical consvucts from statistics, computational complexity, and Bayesian analysis. The practice of machine learning is covered by presenting the major algorithms in the field, along with illustrative traces of their operation. online data sets and implementations of several algorithms are available via the World Wide Web at http://www.cs.cmu.edu/-tom/mlbook.html. These include neural network code and data for face recognition,decision vee leaming code and data for financial loan analysis, and Bayes classifier code and data for analyzing text documents. I am grateful to a number of colleagues who have helped to create these online resources, including Jason Rennie, Paul Hsiung, Jeff Shufelt, Man Glickman, Scott Davies, Joseph O'Sullivan,Ken Lang, Andrew McCallum,and Thorsten Joachims. ACKNOWLEDGMENTS In writing this book, I have been fortunate to be assisted by technical experts in many of the subdisciplines that make up the field of machine learning. This book could not have been written without their help. I am deeply indebted to, the following scientists who took the time to review chapter drafts and, in manycases, to tutor me and help organize chapters in their individual areas of expenise. Avrim Blum, Jaime Carbonell, William Cohen, Greg Cooper, Mark Craven,Ken DeJong, Jerry DeJong, Tom Dietterich, Susan Epstein, Oren Etzioni,Scou Fahlman, Stephanie Forrest, David Haussler, Haym Hirsh, Rob Holte,Leslie Pack Kaelbling, Dennis Kibler, Moshe Koppel, John Koza, Miroslav Kubat, John Laffeny, Ramon Lopez de Mantaras, Sridhar Mahadevan, Stan Matwin, Andrew McCallum, Raymond Mooney, Andrew Moore, Katharina Morik, Steve Muggleton, Michael Pazzani, David Poole, Armand Prieditis,Jim Reggia, Stuart Russell, Lorenza Saitta, Claude Sammut,Jeff Schneider,Jude Shavlik, Devika Subramanian, Michael Swain, Gheorgh Tecuci, Sebastian Thrun, Peter Tumey, Paul Utgoff, Manuela Veloso, Alex Waibel,Stefan Wrobel, and Yiming Yang. I am also grateful to dle many instructors and students at various universities who have field tested various drafts of this book and who have contributed their suggestions. Although there is no space to thank the hundreds of students. instructors, and others who tested earlier drafts of this book, I would like to thank the following for particularly helpful comments and discussions: Shumeet Baluja, Andrew Banas, Andy Barto, Jim Blackson, Justin Boyan, Rich Caruana, Philip Chan, Jonathan Cheyer, Lonnie Chrisman, Dayne Freitag, Geoff Gordon, Warren Greiff, Alexander Harm, Tom loerger, Thorsten Joachim, Atsushi Kawamura, Martina Klose, Sven Koenig, Jay Modi, Andrew Ng, Joseph O'Sullivan, Patrawadee Prasangsit, Doina precup. Bob price, Choon Quek, Sean Slanery, Belinda Thom, Astro Teller, Will Tracz I would Hke to thank Joan Mitchell for creating ale index for the book. I also would like to thank Jean Harpley for help in editing many of the figures.Jane Lonus from ETP Harrison improved the presentation significantly through her copyediting of the manuscript and generally helped usher the manuscript through the intricacies of final production. Eric Munson, my editor at McGraw Hill, provided encouragement and expertise in all phases of alis project. As always, the greatest debt one owes is to one's colleagues, friends, and family. In my case. this debt is especially large. I can hardly imagine a more intellecnlally stimulating environment and supponive set of fuiends than alose I have at Camegie Mellon. Among the many here who helped, I would especially like to alank Sebastian Thrun, who throughout alis project was a constant source of encouragement, technical expenise, and suppoH of all kinds. My parents, as always, encomaged and asked "Is it done yet?" at just the right times. Finally, I must thank my family: Meghan, Shannon, and Joan. They are responsible for this book in more ways alan even they know. This book is dedicated to them. Tom M. Mitchell