Nothing Special   »   [go: up one dir, main page]

(Download PDF) Data Mining Practical Machine Learning Tools and Techniques Fourth Edition Ian H Witten Online Ebook All Chapter PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Data Mining Practical Machine Learning

Tools and Techniques Fourth Edition


Ian H. Witten
Visit to download the full and correct content document:
https://textbookfull.com/product/data-mining-practical-machine-learning-tools-and-tec
hniques-fourth-edition-ian-h-witten/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Practical Data Science with SAP Machine Learning


Techniques for Enterprise Data 1st Edition Greg Foss

https://textbookfull.com/product/practical-data-science-with-sap-
machine-learning-techniques-for-enterprise-data-1st-edition-greg-
foss/

Data Mining and Data Warehousing: Principles and


Practical Techniques 1st Edition Parteek Bhatia

https://textbookfull.com/product/data-mining-and-data-
warehousing-principles-and-practical-techniques-1st-edition-
parteek-bhatia/

Encyclopedia of Machine Learning and Data Mining 2nd


Edition Claude Sammut

https://textbookfull.com/product/encyclopedia-of-machine-
learning-and-data-mining-2nd-edition-claude-sammut/

Statistical and Machine-Learning Data Mining, Third


Edition: Techniques for Better Predictive Modeling and
Analysis of Big Data, Third Edition Bruce Ratner

https://textbookfull.com/product/statistical-and-machine-
learning-data-mining-third-edition-techniques-for-better-
predictive-modeling-and-analysis-of-big-data-third-edition-bruce-
Machine Learning and Data Mining in Aerospace
Technology Aboul Ella Hassanien

https://textbookfull.com/product/machine-learning-and-data-
mining-in-aerospace-technology-aboul-ella-hassanien/

Encyclopedia of Machine Learning and Data Mining 2nd


2nd Edition Claude Sammut

https://textbookfull.com/product/encyclopedia-of-machine-
learning-and-data-mining-2nd-2nd-edition-claude-sammut/

R Data Mining Implement data mining techniques through


practical use cases and real world datasets 1st Edition
Andrea Cirillo

https://textbookfull.com/product/r-data-mining-implement-data-
mining-techniques-through-practical-use-cases-and-real-world-
datasets-1st-edition-andrea-cirillo/

Introducing Data Science Big Data Machine Learning and


more using Python tools 1st Edition Davy Cielen

https://textbookfull.com/product/introducing-data-science-big-
data-machine-learning-and-more-using-python-tools-1st-edition-
davy-cielen/

Practical machine learning with H2O powerful scalable


techniques for deep learning and AI First Edition Cook

https://textbookfull.com/product/practical-machine-learning-
with-h2o-powerful-scalable-techniques-for-deep-learning-and-ai-
first-edition-cook/
Data Mining
This page intentionally left blank
Data Mining
Practical Machine Learning
Tools and Techniques
Fourth Edition

Ian H. Witten
University of Waikato, Hamilton, New Zealand

Eibe Frank
University of Waikato, Hamilton, New Zealand

Mark A. Hall
University of Waikato, Hamilton, New Zealand

Christopher J. Pal
Polytechnique Montréal, and the Université de Montréal,
Montreal, QC, Canada

AMSTERDAM • BOSTON • HEIDELBERG • LONDON


NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Morgan Kaufmann is an imprint of Elsevier
Morgan Kaufmann is an imprint of Elsevier
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright © 2017, 2011, 2005, 2000 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, recording, or any information storage and retrieval system,
without permission in writing from the publisher. Details on how to seek permission, further
information about the Publisher’s permissions policies and our arrangements with organizations
such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our
website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).

Notices
Knowledge and best practice in this field are constantly changing. As new research and experience
broaden our understanding, changes in research methods, professional practices, or medical treatment
may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating
and using any information, methods, compounds, or experiments described herein. In using such
information or methods they should be mindful of their own safety and the safety of others, including
parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume
any liability for any injury and/or damage to persons or property as a matter of products liability,
negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas
contained in the material herein.

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library

Library of Congress Cataloging-in-Publication Data


A catalog record for this book is available from the Library of Congress

ISBN: 978-0-12-804291-5

For Information on all Morgan Kaufmann publications


visit our website at https://www.elsevier.com

Publisher: Todd Green


Acquisition Editor: Tim Pitts
Editorial Project Manager: Charlotte Kent
Production Project Manager: Nicky Carter
Designer: Matthew Limbert
Typeset by MPS Limited, Chennai, India
Contents

List of Figures..........................................................................................................xv
List of Tables..........................................................................................................xxi
Preface ................................................................................................................. xxiii

PART I INTRODUCTION TO DATA MINING


CHAPTER 1 What’s it all about? .......................................................... 3
1.1 Data Mining and Machine Learning..............................................4
Describing Structural Patterns ....................................................... 6
Machine Learning.......................................................................... 7
Data Mining ................................................................................... 9
1.2 Simple Examples: The Weather Problem and Others...................9
The Weather Problem.................................................................. 10
Contact Lenses: An Idealized Problem ....................................... 12
Irises: A Classic Numeric Dataset .............................................. 14
CPU Performance: Introducing Numeric Prediction .................. 16
Labor Negotiations: A More Realistic Example......................... 16
Soybean Classification: A Classic Machine Learning
Success ................................................................................... 19
1.3 Fielded Applications ....................................................................21
Web Mining ................................................................................. 21
Decisions Involving Judgment .................................................... 22
Screening Images......................................................................... 23
Load Forecasting ......................................................................... 24
Diagnosis...................................................................................... 25
Marketing and Sales .................................................................... 26
Other Applications....................................................................... 27
1.4 The Data Mining Process.............................................................28
1.5 Machine Learning and Statistics..................................................30
1.6 Generalization as Search..............................................................31
Enumerating the Concept Space ................................................. 32
Bias .............................................................................................. 33
1.7 Data Mining and Ethics ...............................................................35
Reidentification............................................................................ 36
Using Personal Information......................................................... 37
Wider Issues................................................................................. 38
1.8 Further Reading and Bibliographic Notes ...................................38

v
vi Contents

CHAPTER 2 Input: concepts, instances, attributes ......................... 43


2.1 What’s a Concept? .......................................................................44
2.2 What’s in an Example?................................................................46
Relations ...................................................................................... 47
Other Example Types .................................................................. 51
2.3 What’s in an Attribute?................................................................53
2.4 Preparing the Input.......................................................................56
Gathering the Data Together ....................................................... 56
ARFF Format............................................................................... 57
Sparse Data .................................................................................. 60
Attribute Types ............................................................................ 61
Missing Values ............................................................................ 62
Inaccurate Values......................................................................... 63
Unbalanced Data.......................................................................... 64
Getting to Know Your Data ........................................................ 65
2.5 Further Reading and Bibliographic Notes...................................65

CHAPTER 3 Output: knowledge representation ............................... 67


3.1 Tables ...........................................................................................68
3.2 Linear Models ..............................................................................68
3.3 Trees .............................................................................................70
3.4 Rules .............................................................................................75
Classification Rules ..................................................................... 75
Association Rules ........................................................................ 79
Rules With Exceptions ................................................................ 80
More Expressive Rules................................................................ 82
3.5 Instance-Based Representation ....................................................84
3.6 Clusters .........................................................................................87
3.7 Further Reading and Bibliographic Notes...................................88

CHAPTER 4 Algorithms: the basic methods ..................................... 91


4.1 Inferring Rudimentary Rules .......................................................93
Missing Values and Numeric Attributes ..................................... 94
4.2 Simple Probabilistic Modeling ....................................................96
Missing Values and Numeric Attributes ................................... 100
Naı̈ve Bayes for Document Classification ................................ 103
Remarks ..................................................................................... 105
4.3 Divide-and-Conquer: Constructing Decision Trees ..................105
Calculating Information............................................................. 108
Highly Branching Attributes ..................................................... 110
Contents vii

4.4 Covering Algorithms: Constructing Rules .............................. 113


Rules Versus Trees .................................................................. 114
A Simple Covering Algorithm ................................................ 115
Rules Versus Decision Lists.................................................... 119
4.5 Mining Association Rules........................................................ 120
Item Sets .................................................................................. 120
Association Rules .................................................................... 122
Generating Rules Efficiently ................................................... 124
4.6 Linear Models .......................................................................... 128
Numeric Prediction: Linear Regression .................................. 128
Linear Classification: Logistic Regression ............................. 129
Linear Classification Using the Perceptron ............................ 131
Linear Classification Using Winnow ...................................... 133
4.7 Instance-Based Learning.......................................................... 135
The Distance Function............................................................. 135
Finding Nearest Neighbors Efficiently ................................... 136
Remarks ................................................................................... 141
4.8 Clustering ................................................................................. 141
Iterative Distance-Based Clustering ........................................ 142
Faster Distance Calculations ................................................... 144
Choosing the Number of Clusters ........................................... 146
Hierarchical Clustering............................................................ 147
Example of Hierarchical Clustering........................................ 148
Incremental Clustering............................................................. 150
Category Utility ....................................................................... 154
Remarks ................................................................................... 156
4.9 Multi-instance Learning........................................................... 156
Aggregating the Input.............................................................. 157
Aggregating the Output ........................................................... 157
4.10 Further Reading and Bibliographic Notes...............................158
4.11 WEKA Implementations.......................................................... 160

CHAPTER 5 Credibility: evaluating what’s been learned ............. 161


5.1 Training and Testing .................................................................. 163
5.2 Predicting Performance.............................................................. 165
5.3 Cross-Validation......................................................................... 167
5.4 Other Estimates .......................................................................... 169
Leave-One-Out .......................................................................... 169
The Bootstrap............................................................................. 169
5.5 Hyperparameter Selection.......................................................... 171
viii Contents

5.6 Comparing Data Mining Schemes...........................................172


5.7 Predicting Probabilities............................................................176
Quadratic Loss Function.......................................................... 177
Informational Loss Function ................................................... 178
Remarks ................................................................................... 179
5.8 Counting the Cost ....................................................................179
Cost-Sensitive Classification ................................................... 182
Cost-Sensitive Learning........................................................... 183
Lift Charts ................................................................................ 183
ROC Curves ............................................................................. 186
Recall-Precision Curves........................................................... 190
Remarks ................................................................................... 190
Cost Curves.............................................................................. 192
5.9 Evaluating Numeric Prediction ...............................................194
5.10 The MDL Principle..................................................................197
5.11 Applying the MDL Principle to Clustering.............................200
5.12 Using a Validation Set for Model Selection ...........................201
5.13 Further Reading and Bibliographic Notes...............................202

PART II MORE ADVANCED MACHINE LEARNING SCHEMES


CHAPTER 6 Trees and rules ............................................................. 209
6.1 Decision Trees............................................................................210
Numeric Attributes .................................................................... 210
Missing Values .......................................................................... 212
Pruning ....................................................................................... 213
Estimating Error Rates .............................................................. 215
Complexity of Decision Tree Induction.................................... 217
From Trees to Rules .................................................................. 219
C4.5: Choices and Options........................................................ 219
Cost-Complexity Pruning .......................................................... 220
Discussion .................................................................................. 221
6.2 Classification Rules....................................................................221
Criteria for Choosing Tests ....................................................... 222
Missing Values, Numeric Attributes ......................................... 223
Generating Good Rules ............................................................. 224
Using Global Optimization........................................................ 226
Obtaining Rules From Partial Decision Trees .......................... 227
Rules With Exceptions .............................................................. 231
Discussion .................................................................................. 233
Contents ix

6.3 Association Rules....................................................................... 234


Building a Frequent Pattern Tree .............................................. 235
Finding Large Item Sets ............................................................ 240
Discussion .................................................................................. 241
6.4 WEKA Implementations............................................................ 242

CHAPTER 7 Extending instance-based and linear models .......... 243


7.1 Instance-Based Learning............................................................ 244
Reducing the Number of Exemplars ......................................... 245
Pruning Noisy Exemplars.......................................................... 245
Weighting Attributes ................................................................. 246
Generalizing Exemplars............................................................. 247
Distance Functions for Generalized Exemplars........................ 248
Generalized Distance Functions ................................................ 250
Discussion .................................................................................. 250
7.2 Extending Linear Models........................................................... 252
The Maximum Margin Hyperplane........................................... 253
Nonlinear Class Boundaries ...................................................... 254
Support Vector Regression........................................................ 256
Kernel Ridge Regression ........................................................... 258
The Kernel Perceptron............................................................... 260
Multilayer Perceptrons............................................................... 261
Radial Basis Function Networks ............................................... 270
Stochastic Gradient Descent...................................................... 270
Discussion .................................................................................. 272
7.3 Numeric Prediction With Local Linear Models........................ 273
Model Trees ............................................................................... 274
Building the Tree ....................................................................... 275
Pruning the Tree ........................................................................ 275
Nominal Attributes .................................................................... 276
Missing Values .......................................................................... 276
Pseudocode for Model Tree Induction...................................... 277
Rules From Model Trees........................................................... 281
Locally Weighted Linear Regression........................................ 281
Discussion .................................................................................. 283
7.4 WEKA Implementations............................................................ 284

CHAPTER 8 Data transformations .................................................... 285


8.1 Attribute Selection ..................................................................... 288
Scheme-Independent Selection.................................................. 289
Searching the Attribute Space ................................................... 292
Scheme-Specific Selection ........................................................ 293
x Contents

8.2 Discretizing Numeric Attributes ................................................296


Unsupervised Discretization ...................................................... 297
Entropy-Based Discretization.................................................... 298
Other Discretization Methods.................................................... 301
Entropy-Based Versus Error-Based Discretization................... 302
Converting Discrete to Numeric Attributes .............................. 303
8.3 Projections ..................................................................................304
Principal Component Analysis .................................................. 305
Random Projections................................................................... 307
Partial Least Squares Regression .............................................. 307
Independent Component Analysis............................................. 309
Linear Discriminant Analysis.................................................... 310
Quadratic Discriminant Analysis .............................................. 310
Fisher’s Linear Discriminant Analysis...................................... 311
Text to Attribute Vectors........................................................... 313
Time Series ................................................................................ 314
8.4 Sampling.....................................................................................315
Reservoir Sampling ................................................................... 315
8.5 Cleansing ....................................................................................316
Improving Decision Trees ......................................................... 316
Robust Regression ..................................................................... 317
Detecting Anomalies ................................................................. 318
One-Class Learning ................................................................... 319
Outlier Detection ....................................................................... 320
Generating Artificial Data ......................................................... 321
8.6 Transforming Multiple Classes to Binary Ones ........................322
Simple Methods ......................................................................... 323
Error-Correcting Output Codes ................................................. 324
Ensembles of Nested Dichotomies............................................ 326
8.7 Calibrating Class Probabilities...................................................328
8.8 Further Reading and Bibliographic Notes.................................331
8.9 WEKA Implementations............................................................334

CHAPTER 9 Probabilistic methods .................................................. 335


9.1 Foundations ................................................................................336
Maximum Likelihood Estimation ............................................. 338
Maximum a Posteriori Parameter Estimation ........................... 339
9.2 Bayesian Networks.....................................................................339
Making Predictions .................................................................... 340
Contents xi

Learning Bayesian Networks .................................................... 344


Specific Algorithms ................................................................... 347
Data Structures for Fast Learning ............................................. 349
9.3 Clustering and Probability Density Estimation ......................... 352
The Expectation Maximization Algorithm for a Mixture
of Gaussians ......................................................................... 353
Extending the Mixture Model ................................................... 356
Clustering Using Prior Distributions......................................... 358
Clustering With Correlated Attributes ...................................... 359
Kernel Density Estimation ........................................................ 361
Comparing Parametric, Semiparametric and Nonparametric
Density Models for Classification ....................................... 362
9.4 Hidden Variable Models ............................................................ 363
Expected Log-Likelihoods and Expected Gradients................. 364
The Expectation Maximization Algorithm ............................... 365
Applying the Expectation Maximization Algorithm
to Bayesian Networks .......................................................... 366
9.5 Bayesian Estimation and Prediction .......................................... 367
Probabilistic Inference Methods................................................ 368
9.6 Graphical Models and Factor Graphs........................................ 370
Graphical Models and Plate Notation ....................................... 371
Probabilistic Principal Component Analysis............................. 372
Latent Semantic Analysis .......................................................... 376
Using Principal Component Analysis for Dimensionality
Reduction ............................................................................. 377
Probabilistic LSA....................................................................... 378
Latent Dirichlet Allocation........................................................ 379
Factor Graphs............................................................................. 382
Markov Random Fields ............................................................. 385
Computing Using the Sum-Product and Max-Product
Algorithms ........................................................................... 386
9.7 Conditional Probability Models................................................. 392
Linear and Polynomial Regression as Probability
Models.................................................................................. 392
Using Priors on Parameters ....................................................... 393
Multiclass Logistic Regression.................................................. 396
Gradient Descent and Second-Order Methods.......................... 400
Generalized Linear Models ....................................................... 400
Making Predictions for Ordered Classes................................... 402
Conditional Probabilistic Models Using Kernels...................... 402
xii Contents

9.8 Sequential and Temporal Models ............................................403


Markov Models and N-gram Methods .................................... 403
Hidden Markov Models........................................................... 404
Conditional Random Fields ..................................................... 406
9.9 Further Reading and Bibliographic Notes...............................410
Software Packages and Implementations................................ 414
9.10 WEKA Implementations..........................................................416

CHAPTER 10 Deep learning .................................................. 417


10.1 Deep Feedforward Networks ...................................................420
The MNIST Evaluation ........................................................... 421
Losses and Regularization ....................................................... 422
Deep Layered Network Architecture ...................................... 423
Activation Functions................................................................ 424
Backpropagation Revisited...................................................... 426
Computation Graphs and Complex Network Structures ........ 429
Checking Backpropagation Implementations ......................... 430
10.2 Training and Evaluating Deep Networks ................................431
Early Stopping ......................................................................... 431
Validation, Cross-Validation, and Hyperparameter Tuning ... 432
Mini-Batch-Based Stochastic Gradient Descent ..................... 433
Pseudocode for Mini-Batch Based Stochastic Gradient
Descent.................................................................................434
Learning Rates and Schedules................................................. 434
Regularization With Priors on Parameters.............................. 435
Dropout .................................................................................... 436
Batch Normalization................................................................ 436
Parameter Initialization............................................................ 436
Unsupervised Pretraining......................................................... 437
Data Augmentation and Synthetic Transformations............... 437
10.3 Convolutional Neural Networks ..............................................437
The ImageNet Evaluation and Very Deep Convolutional
Networks ..............................................................................438
From Image Filtering to Learnable Convolutional Layers..... 439
Convolutional Layers and Gradients....................................... 443
Pooling and Subsampling Layers and Gradients .................... 444
Implementation ........................................................................ 445
10.4 Autoencoders............................................................................445
Pretraining Deep Autoencoders With RBMs.......................... 448
Denoising Autoencoders and Layerwise Training.................. 448
Combining Reconstructive and Discriminative Learning....... 449
Contents xiii

10.5 Stochastic Deep Networks ....................................................... 449


Boltzmann Machines ............................................................... 449
Restricted Boltzmann Machines.............................................. 451
Contrastive Divergence ........................................................... 452
Categorical and Continuous Variables.................................... 452
Deep Boltzmann Machines...................................................... 453
Deep Belief Networks ............................................................. 455
10.6 Recurrent Neural Networks ..................................................... 456
Exploding and Vanishing Gradients ....................................... 457
Other Recurrent Network Architectures ................................. 459
10.7 Further Reading and Bibliographic Notes...............................461
10.8 Deep Learning Software and Network Implementations........464
Theano...................................................................................... 464
Tensor Flow ............................................................................. 464
Torch ........................................................................................ 465
Computational Network Toolkit.............................................. 465
Caffe......................................................................................... 465
Deeplearning4j......................................................................... 465
Other Packages: Lasagne, Keras, and cuDNN........................ 465
10.9 WEKA Implementations.......................................................... 466

CHAPTER 11 Beyond supervised and unsupervised learning ..... 467


11.1 Semisupervised Learning......................................................... 468
Clustering for Classification.................................................... 468
Cotraining ................................................................................ 470
EM and Cotraining .................................................................. 471
Neural Network Approaches ................................................... 471
11.2 Multi-instance Learning........................................................... 472
Converting to Single-Instance Learning ................................. 472
Upgrading Learning Algorithms ............................................. 475
Dedicated Multi-instance Methods.......................................... 475
11.3 Further Reading and Bibliographic Notes...............................477
11.4 WEKA Implementations.......................................................... 478

CHAPTER 12 Ensemble learning ...................................................... 479


12.1 Combining Multiple Models.................................................... 480
12.2 Bagging .................................................................................... 481
BiasVariance Decomposition ............................................... 482
Bagging With Costs................................................................. 483
12.3 Randomization ......................................................................... 484
Randomization Versus Bagging .............................................. 485
Rotation Forests ....................................................................... 486
xiv Contents

12.4 Boosting ...................................................................................486


AdaBoost.................................................................................. 487
The Power of Boosting............................................................ 489
12.5 Additive Regression.................................................................490
Numeric Prediction .................................................................. 491
Additive Logistic Regression .................................................. 492
12.6 Interpretable Ensembles...........................................................493
Option Trees ............................................................................ 494
Logistic Model Trees............................................................... 496
12.7 Stacking....................................................................................497
12.8 Further Reading and Bibliographic Notes...............................499
12.9 WEKA Implementations..........................................................501

CHAPTER 13 Moving on: applications and beyond..................... 503


13.1 Applying Machine Learning..................................................504
13.2 Learning From Massive Datasets ..........................................506
13.3 Data Stream Learning............................................................509
13.4 Incorporating Domain Knowledge ........................................512
13.5 Text Mining ...........................................................................515
Document Classification and Clustering............................... 516
Information Extraction........................................................... 517
Natural Language Processing ................................................ 518
13.6 Web Mining ...........................................................................519
Wrapper Induction ................................................................. 519
Page Rank .............................................................................. 520
13.7 Images and Speech ................................................................522
Images .................................................................................... 523
Speech .................................................................................... 524
13.8 Adversarial Situations............................................................524
13.9 Ubiquitous Data Mining ........................................................527
13.10 Further Reading and Bibliographic Notes ............................529
13.11 WEKA Implementations .......................................................532

Appendix A: Theoretical foundations...................................................................533


Appendix B: The WEKA workbench ...................................................................553
References..............................................................................................................573
Index ......................................................................................................................601
List of Figures

Figure 1.1 Rules for the contact lens data. 13


Figure 1.2 Decision tree for the contact lens data. 14
Figure 1.3 Decision trees for the labor negotiations data. 18
Figure 1.4 Life cycle of a data mining project. 29
Figure 2.1 A family tree and two ways of expressing the sister-of 48
relation.
Figure 2.2 ARFF file for the weather data. 58
Figure 2.3 Multi-instance ARFF file for the weather data. 60
Figure 3.1 A linear regression function for the CPU performance 69
data.
Figure 3.2 A linear decision boundary separating Iris setosas from 70
Iris versicolors.
Figure 3.3 Constructing a decision tree interactively: (A) creating a 73
rectangular test involving petallength and petalwidth;
(B) the resulting (unfinished) decision tree.
Figure 3.4 Models for the CPU performance data: (A) linear 74
regression; (B) regression tree; (C) model tree.
Figure 3.5 Decision tree for a simple disjunction. 76
Figure 3.6 The exclusive-or problem. 77
Figure 3.7 Decision tree with a replicated subtree. 77
Figure 3.8 Rules for the iris data. 81
Figure 3.9 The shapes problem. 82
Figure 3.10 Different ways of partitioning the instance space. 86
Figure 3.11 Different ways of representing clusters. 88
Figure 4.1 Pseudocode for 1R. 93
Figure 4.2 Tree stumps for the weather data. 106
Figure 4.3 Expanded tree stumps for the weather data. 108
Figure 4.4 Decision tree for the weather data. 109
Figure 4.5 Tree stump for the ID code attribute. 111
Figure 4.6 Covering algorithm: (A) covering the instances; 113
(B) decision tree for the same problem.
Figure 4.7 The instance space during operation of a covering 115
algorithm.
Figure 4.8 Pseudocode for a basic rule learner. 118

xv
xvi List of Figures

Figure 4.9 (A) Finding all item sets with sufficient coverage; 127
(B) finding all sufficiently accurate association rules
for a k-item set.
Figure 4.10 Logistic regression: (A) the logit transform; (B) example 130
logistic regression function.
Figure 4.11 The perceptron: (A) learning rule; (B) representation as 132
a neural network.
Figure 4.12 The Winnow algorithm: (A) unbalanced version; 134
(B) balanced version.
Figure 4.13 A kD-tree for four training instances: (A) the tree; 137
(B) instances and splits.
Figure 4.14 Using a kD-tree to find the nearest neighbor of the star. 137
Figure 4.15 Ball tree for 16 training instances: (A) instances and 139
balls; (B) the tree.
Figure 4.16 Ruling out an entire ball (gray) based on a target point 140
(star) and its current nearest neighbor.
Figure 4.17 Iterative distance-based clustering. 143
Figure 4.18 A ball tree: (A) two cluster centers and their dividing 145
line; (B) corresponding tree.
Figure 4.19 Hierarchical clustering displays. 149
Figure 4.20 Clustering the weather data. 151
Figure 4.21 Hierarchical clusterings of the iris data. 153
Figure 5.1 A hypothetical lift chart. 185
Figure 5.2 Analyzing the expected benefit of a mailing campaign 187
when the cost of mailing is (A) $0.50 and (B) $0.80.
Figure 5.3 A sample ROC curve. 188
Figure 5.4 ROC curves for two learning schemes. 189
Figure 5.5 Effect of varying the probability threshold: (A) error 193
curve; (B) cost curve.
Figure 6.1 Example of subtree raising, where node C is “raised” 214
to subsume node B.
Figure 6.2 Pruning the labor negotiations decision tree. 216
Figure 6.3 Algorithm for forming rules by incremental reduced- 226
error pruning.
Figure 6.4 RIPPER: (A) algorithm for rule learning; (B) meaning 228
of symbols.
Figure 6.5 Algorithm for expanding examples into a partial tree. 229
Figure 6.6 Example of building a partial tree. 230
Figure 6.7 Rules with exceptions for the iris data. 232
xxviii Preface

radial basis function networks, and also included support vector machines for
regression. We incorporated a new section on Bayesian networks, again in
response to readers’ requests and WEKA’s new capabilities in this regard, with a
description of how to learn classifiers based on these networks, and how to imple-
ment them efficiently using AD trees.
The previous 5 years (19992004) had seen great interest in data mining for
text, and this was reflected in the introduction of string attributes in WEKA, mul-
tinomial Bayes for document classification, and text transformations. We also
described efficient data structures for searching the instance space: kD-trees and
ball trees for finding nearest neighbors efficiently, and for accelerating distance-
based clustering. We described new attribute selection schemes such as race
search and the use of support vector machines; new methods for combining mod-
els such as additive regression, additive logistic regression, logistic model trees,
and option trees. We also covered recent developments in using unlabeled data to
improve classification, including the cotraining and co-EM methods.

THIRD EDITION
For the third edition, we thoroughly edited the second edition and brought it up to
date, including a great many new methods and algorithms. WEKA and the book
were closely linked together—pretty well everything in WEKA was covered in
the book. We also included far more references to the literature, practically tri-
pling the number of references that were in the first edition.
As well as becoming far easier to use, WEKA had grown beyond recognition
over the previous decade, and matured enormously in its data mining capabilities.
It incorporates an unparalleled range of machine learning algorithms and related
techniques. The growth has been partly stimulated by recent developments in the
field, and is partly user-led and demand-driven. This puts us in a position where
we know a lot about what actual users of data mining want, and we have capital-
ized on this experience when deciding what to include in this book.
Here are a few of the highlights of the material that was added in the third edi-
tion. A section on web mining was included, and, under ethics, a discussion of
how individuals can often be “reidentified” from supposedly anonymized data.
Other additions included techniques for multi-instance learning, new material on
interactive cost-benefit analysis, cost-complexity pruning, advanced association
rule algorithms that use extended prefix trees to store a compressed version of the
dataset in main memory, kernel ridge regression, stochastic gradient descent, and
hierarchical clustering methods. We added new data transformations: partial least
squares regression, reservoir sampling, one-class learning, decomposing multi-
class classification problems into ensembles of nested dichotomies, and calibrat-
ing class probabilities. We added new information on ensemble learning
techniques: randomization vs. bagging, and rotation forests. New sections on data
stream learning and web mining were added as well.
Another random document with
no related content on Scribd:
joke took immediately. The officers could not help laughing; for,
though we considered them little better than fiends at that moment of
excitement, they were, in fact, except in this instance, the best-
natured and most indulgent men I remember to have sailed with.
They, of course, ordered the crape to be instantly cut off from the
dogs’ legs; and one of the officers remarked to us, seriously, that as
we had now had our piece of fun out, there were to be no more such
tricks.
Off we scampered, to consult old Daddy what was to be done
next, as we had been positively ordered not to meddle any more with
the dogs.
“Put the pigs in mourning,” he said.
All our crape was expended by this time; but this want was soon
supplied by men whose trade it is to discover resources in difficulty.
With a generous devotion to the cause of public spirit, one of these
juvenile mutineers pulled off his black handkerchief, and, tearing it in
pieces, gave a portion to each of the circle, and away we all started
to put into practice this new suggestion of our director-general of
mischief.
The row which ensued in the pig-sty was prodigious—for in those
days, hogs were allowed a place on board a man-of-war,—a custom
most wisely abolished of late years, since nothing can be more out of
character with any ship than such nuisances. As these matters of
taste and cleanliness were nothing to us, we did not intermit our
noisy labour till every one of the grunters had his armlet of such
crape as we had been able to muster. We then watched our
opportunity, and opened the door so as to let out the whole herd of
swine on the main-deck, just at a moment when a group of the
officers were standing on the fore part of the quarter-deck. Of
course, the liberated pigs, delighted with their freedom, passed in
review under the very nose of our superiors, each with his mourning
knot displayed, grunting or squealing along, as if it was their express
object to attract attention to their domestic sorrow for the loss of
Shakings. The officers were excessively provoked, as they could not
help seeing that all this was affording entertainment, at their
expense, to the whole crew; for, although the men took no part in this
touch of insubordination, they were ready enough, in those idle times
of the weary, weary peace, to catch at any species of distraction or
devilry, no matter what, to compensate for the loss of their wonted
occupation of pommeling their enemies.
The matter, therefore, necessarily became rather serious; and the
whole gang of us being sent for on the quarter-deck, we were ranged
in a line, each with his toes at the edge of a plank, according to the
orthodox fashion of these gregarious scoldings, technically called
‘toe-the-line matches.’ We were then given to understand that our
proceedings were impertinent, and, after the orders we had received,
highly offensive. It was with much difficulty that either party could
keep their countenances during this official lecture, for, while it was
going on, the sailors were endeavouring, by the direction of the
officers, to remove the bits of silk from the legs of the pigs. If,
however, it be difficult—as most difficult we found it—to put a hog
into mourning, it is a job ten times more troublesome to take him out
again. Such at least is the fair inference from these two experiments;
the only ones perhaps on record,—for it cost half the morning to
undo what we had effected in less than an hour—to say nothing of
the unceasing and outrageous uproar which took place along the
decks, especially under the guns, and even under the coppers,
forward in the galley, where two or three of the youngest pigs had
wedged themselves, apparently resolved to die rather than submit to
the degradation of being deprived of their mourning.
All this was very creditable to the memory of poor Shakings; but, in
the course of the day, the real secret of this extraordinary difficulty of
taking a pig out of mourning was discovered. Two of the raids were
detected in the very fact of tying on a bit of black buntin to the leg of
a sow, from which the seamen declared they had already cut off
crape and silk enough to have made her a complete suit of black.
As soon as these fresh offences were reported, the whole party of
us were ordered to the mast-head as a punishment. Some were sent
to sit on the topmast cross-trees, some on the top-gallant yard-arms,
and one small gentleman being perched at the jib-boom end, was
very properly balanced abaft by another little culprit at the extremity
of the gaff. In this predicament we were hung out to dry for six or
eight hours, as old Daddy remarked to us with a grin, when we were
called down as the night fell.
Our persevering friend, being rather provoked at the punishment
of his young flock, now set to work to discover the real fate of
Shakings. It soon occurred to him, that if the dog had really been
made away with, as he shrewdly suspected, the butcher, in all
probability, must have had a hand in his murder; accordingly, he sent
for the man in the evening, when the following dialogue took place:—
“Well, butcher, will you have a glass of grog to-night?”
“Thank you, sir, thank you. Here’s your honour’s health!” said the
other, after smoothing down his hair, and pulling an immense quid of
tobacco out of his mouth.
Old Daddy observed the peculiar relish with which the butcher
took his glass; and mixing another, a good deal more potent, placed
it before the fellow, and continued the conversation in these words:
“I tell you what it is, Mr. Butcher—you are as humane a man as
any in the ship, I dare say; but, if required, you know well, that you
must do your duty, whether it is upon sheep or hogs?”
“Surely, sir.”
“Or upon dogs, either?” suddenly asked the inquisitor.
“I don’t know about that,” stammered the butcher, quite taken by
surprise, and thrown all aback.
“Well—well,” said Daddy, “here’s another glass for you—a stiff
north-wester. Come! tell us all about it now. How did you get rid of
the dog?—of Shakings, I mean?”
“Why, sir,” said the peaching rogue, “I put him in a bag—a bread
bag, sir.”
“Well!—what then?”
“I tied up the mouth, and put him overboard—out of the midship
lower-deck port, sir.”
“Yes—but he would not sink?” said Daddy.
“Oh, sir,” cried the butcher, now entering fully into the merciless
spirit of his trade, “I put a four-and-twenty-pound shot into the bag
along with Shakings.”
“Did you?—Then, Master Butcher, all I can say is, you are as
precious a rascal as ever went about unhanged. There—drink your
grog, and be off with you!”
Next morning when the officers were assembled at breakfast in
the ward-room, the door of the captain of marines’ cabin was
suddenly opened, and that officer, half shaved, and laughing through
a collar of soap-suds, stalked out, with a paper in his hand.
“Here,” he exclaimed, “is a copy of verses, which I found just now
in my basin. I can’t tell how they got there, nor what they are about;
—but you shall judge.”
So he read the two following stanzas of doggerel:—

“When the Northern Confed’racy threatened our shores,


And roused Albion’s Lion, reclining to sleep,
Preservation was taken of all the King’s Stores,
Nor so much as a Rope Yarn was launched in the deep.

“But now it is Peace, other hopes are in view,


And all active service as light as a feather,
The Stores may be d—d, and humanity too,
For Shakings and Shot are thrown o’erboard together!”

I need hardly say in what quarter of the ship this biting morsel of
cock-pit satire was concocted, nor indeed who wrote it, for there was
no one but our good Daddy who was equal to such a flight. About
midnight, an urchin—who shall be nameless—was thrust out of one
of the after-ports of the lower deck, from which he clambered up to
the marine officer’s port, and the sash happening to have been
lowered down on the gun, the epigram, copied by another of the
youngsters, was pitched into the soldier’s basin.
The wisest thing would have been for the officers to have said
nothing about the matter, and let it blow by. But angry people are
seldom judicious—so they made a formal complaint to the captain,
who, to do him justice, was not a little puzzled how to settle the affair.
The reputed author, however, was called up, and the captain said to
him—
“Pray, sir, are you the writer of these lines?”
“I am, sir,” he replied, after a little consideration.
“Then—all I can say is,” remarked the captain, “they are clever
enough, in their way—but take my advice, and write no more such
verses.”
So the affair ended. The satirist took the captain’s hint in good
part, and confined his pen to topics below the surface of the water.
As in the course of a few months the war broke out, there was no
longer time for such nonsense, and our generous protector, old
Daddy, some time after this affair of Shakings took place, was sent
off to Halifax, in charge of a prize. His orders were, if possible, to
rejoin his own ship, the Leander, then lying at the entrance of New
York harbour, just within Sandy Hook light-house.
Our good old friend, accordingly, having completed his mission,
and delivered his prize to the authorities at Halifax, took his passage
in the British packet sailing from thence to the port in which we lay.
As this ship sailed past us, on her way to the city of New York, we
ascertained, to our great joy, that our excellent Daddy was actually
on board of her. Some hours afterwards, the pilot-boat was seen
coming to us, and, though it was in the middle of the night, all the
younger mids came hastily on deck to welcome their worthy
messmate back again to his ship.
It was late in October, and the wind blew fresh from the north-
westward, so that the ship, riding to the ebb, had her head directed
towards the Narrows, between Staten Land and Long Island:
consequently, the pilot-boat,—one of those beautiful vessels so well
known to every visitor of the American coast,—came flying down
upon us, with the wind nearly right aft. Our joyous party were all
assembled on the quarter-deck, looking anxiously at the boat as she
swept past us. She then luffed round, in order to sheer alongside, at
which moment the main-sail jibed, as was to be expected. It was
obvious, however, that something more had taken place than the
pilot had looked for, since the boat, instead of ranging up to us, was
brought right round on her heel, and went off again upon a wind on
the other tack. The tide carried her out of sight for a few minutes, but
she was soon alongside, when we learned, to our inexpressible grief
and consternation, that, on the main-boom of the pilot-boat swinging
over, it had accidentally struck our poor friend, and pitched him
headlong overboard. Being encumbered with his great-coat, the
pockets of which, as we afterwards learned, were loaded with his
young companions’ letters, brought from England by this packet, he
in vain struggled to catch hold of the boat, and then sunk to rise no
more!
CHAPTER VI.
DIVERSITIES IN DISCIPLINE.

It was our fortune in the Leander to change captains very frequently;


and, as most of the plans of those officers were dissimilar, the
perplexity which such variations produced is not to be described.
Fortunately, however, there is so much uniformity in the routine of
naval discipline, that, in spite of any variety in the systems
established by a succession of commanding officers, things do
somehow contrive to run on to their final purpose pretty well. It is true
the interests of the service often suffer for a time, and in a small
degree; but public-spirited and vigilant officers know well how to
extract lasting profit even from the unsettled, revolutionary state of
affairs which is apt to occur at these periods. On the other hand, it is
at these times also that the class called skulkers most easily shirk
their duty, while those who really like their business, are even at the
time more certain of being favourably noticed than at any other
moment; because it becomes obvious, that, without them, things
would not go on at all. Although the variety of methods, therefore,
introduced by different captains in succession, is apt to distract and
unhinge the discipline, it likewise teaches much that is useful—at
least to those who are on the alert, and who wish to improve.
I was too young and inexperienced, at that time, to profit by these
repeated changes, as I might have done had I been duly aware that
there were so many advantages to be found in observing their
effects. And it is chiefly on this account that I mention the
circumstance just now, in order to recommend young men to avoid
the very common practice, on board ship, of despising all the plans
introduced by the new officer, and lauding to the skies the practices
of the captain who has gone. It is not such an easy affair, let me tell
them, as they suppose, to regulate the internal affairs of a ship—
and, however clever they may fancy themselves, they will find their
best interest in trying, upon these occasions, not so much to
discover points of censure, as to discover, and impress on their
memory, topics of practical utility, hints for the solution of future
difficulties, and methods of turning their own resources to
professional account.
Even at this distance of time, and although most of the officers I
am now speaking of have long since been dead and gone, I still feel
that it would be a sort of disrespectful liberty in me, and perhaps not
very useful, to point out, with any minuteness of detail, those
particular points in their modes of management which struck me as
being faulty at the time, or which now seem worthy of
commendation. I shall merely mention a trait of character by which
two of them were contradistinguished from each other; and I do so
the more readily, as the example seems to contain a lesson nearly
as applicable, perhaps, to domestic matters, as to those of a stern
profession like the navy.
Whenever one of these commanding officers came on board the
ship, after an absence of a day or two, and likewise when he made
his periodical round of the decks after breakfast, his constant habit
was to cast his eye about him, in order to discover what was wrong
—to detect the smallest thing that was out of its place—in a word, to
find as many grounds for censure as possible. This constituted, in
his opinion, the best preventive to neglect, on the part of those under
his command; and he acted in this crusty way on principle.
The attention of the other officer, on the contrary, appeared to be
directed chiefly to those points which he could approve of. For
instance, he would stop as he went along, from time to time, and say
to the first lieutenant, “Now, these ropes are very nicely arranged;
this mode of stowing the men’s bags and mess kids is just as I wish
to see it.” While the officer first described would not only pass by
these well-arranged things, which had cost hours of labour to put in
order, quite unnoticed, but would not be easy till his eye had caught
hold of some casual omission, which afforded an opening for
disapprobation. One of these captains would remark to the first
lieutenant, as he walked along, “How white and clean you have got
the decks to-day! I think you must have been at them all the
morning, to have got them into such order.” The other, in similar
circumstances, but eager to find fault, would say, even if the decks
were as white and clean as drifted snow—“I wish to Heaven, sir, you
would teach these sweepers to clear away that bundle of shakings!”
pointing to a bit of rope yarn, not half an inch long, left under the
truck of a gun.
It seemed, in short, as if nothing was more vexatious to one of
these officers, than to discover things so correct as to afford him no
good opportunity for finding fault; while to the other, the necessity of
censuring really appeared a punishment to himself. Under the one,
accordingly, we all worked with cheerfulness, from a conviction that
nothing we did in a proper way would miss approbation. But our duty
under the other, being performed in fear, seldom went on with much
spirit. We had no personal satisfaction in doing things correctly, from
the certainty of getting no commendation. The great chance, also, of
being censured, even in those cases where we had laboured most
industriously to merit approbation, broke the spring of all generous
exertion, and, by teaching us to anticipate blame, as a matter of
course, defeated the very purpose of punishment when it fell upon
us. The case being quite hopeless, the chastisement seldom
conduced either to the amendment of an offender, or to the
prevention of offences. But what seemed the oddest thing of all was,
that these men were both as kind-hearted as could be, or, if there
were any difference, the fault-finder was the better natured, and in
matters not professional the more indulgent of the two. The line of
conduct I have described was purely a matter of official system, not
at all of feeling. Yet, as it then appeared, and still appears to me,
nothing could be more completely erroneous than the snarling
method of the one, or more decidedly calculated to do good, than the
approving style of the other. It has, in fact, always appeared to me an
absurdity, to make any real distinction between public and private
matters in these respects. Nor is there the smallest reason why the
same principle of civility, or consideration, or by whatever name that
quality be called by which the feelings of others are consulted,
should not modify professional intercourse quite as much as it does
that of the freest society, without any risk that the requisite strictness
of discipline would be hurt by an attention to good manners.
This desire of discovering that things are right, accompanied by a
sincere wish to express that approbation, are habits which, in almost
every situation in life, have the best possible effects in practice. They
are vastly more agreeable certainly to the superior himself, whether
he be the colonel of a regiment, the captain of a ship, or the head of
a house; for the mere act of approving, seldom fails to put a man’s
thoughts into that pleasant train which predisposes him to be
habitually pleased, and this frame of mind alone, essentially helps
the propagation of a similar cheerfulness amongst all those who are
about him. It requires, indeed, but a very little experience of soldiers
or sailors, children, servants, or any other kind of dependents, or
even of companions and superiors, to shew that this good-humour,
on the part of those whom we wish to influence, is the best possible
coadjutor to our schemes of management, whatever these may be.
The approving system is also, beyond all others, the most
stimulating and agreeable for the inferior to work under. Instead of
depressing and humiliating him, it has a constant tendency to make
him think well of himself, so long as he is usefully employed; and as
soon as this point is gained, but seldom before, he will be in a right
frame of mind to think well of others, and to look with hearty zeal to
the execution of his duty. All the burdens of labour are then
lightened, by the conviction that they are well directed; and, instead
of his severest tasks being distasteful, they may often, under the
cheering eye of a superior who shews himself anxious to commend
what is right, become the most substantial pleasures of his life.
I need scarcely dwell longer on this subject, by shewing that
another material advantage of the approving practice consists in the
greater certainty and better quality of the work done by willing hands,
compared to that which is crushed out of people by force. No man
understood this distinction better than Lord Nelson, who acted upon
it uniformly,—with what wonderful success we all know. Some one
was discussing this question with him one day, and pointing out the
eminent success which had attended the opposite plan, followed by
another great officer, Lord St. Vincent:—
“Very true,” said Lord Nelson; “but, in cases where he used a
hatchet, I took a penknife.”
After all, however, it is but too true, that, adopt what course we will
of commendations or other rewards, we must still call in punishments
to our assistance, from time to time. But there can be little doubt that
any well-regulated system of cheerfulness, and just approbation of
what is right, followed not from caprice, but as an express duty, gives
into our hands the means of correcting things which are wrong, with
greater effect, and at a much less cost of suffering, than if our
general habit were that of always finding fault. For it is obvious, that
when affairs are carried on upon the cheerful principle above
described, the mere act of withholding praise becomes a sharp
censure in itself—and this alone is sufficient to recommend its use. It
doubles the work done, by quickening the hands of the labourers—
doubles the happiness of all parties, both high and low—and it may
also be said to double our means of punishing with effect; for it
superadds a class of chastisements, dependent solely upon the
interruption of favours, not upon the infliction of actual pain. The
practical application of these rules to the ordinary course of naval
discipline I shall probably have frequent opportunities of shewing.
In the mean time, I shall merely remark, that in every situation in
life, perhaps without any exception, much of our happiness or
misery, as well as much of our success in the world, depends less
upon the circumstances about us, than upon the manner in which, as
a matter of habit and principle, we choose to view them. In almost
every case there is something to approve of, quite as distinct, if we
wish to see it, as there is of censure, though it may not otherwise be
so conspicuous. It will, of course, very often be quite necessary to
reprobate, without any sort of qualification, what passes before us;
still, without in the smallest degree compromising our sense of what
is wrong, there will always be a way—if there be a will—of
expressing such sentiments that shall not be unsuitable to the
golden precept which recommends us to take a cheerful view of
things.
There is one practical maxim, trite, indeed, though too little acted
upon, but which bears so directly on this subject, that I wish
exceedingly to urge it upon the notice of my young friends, from its
being calculated to prove of much use to them in the business, as
well as the true pleasures of life. In dealing with other men—no
matter what their rank or station may be—we should consider not so
much what they deserve at our hands, as what course is most
suitable for us to follow.
“My lord,” says Polonius to Hamlet, in speaking of the poor
players, “I will use them according to their desert.”
“Odd’s bodikin, man, much better!” is the answer of the judicious
and kind-hearted prince. “Use every man after his desert, and who
shall ’scape whipping? Use them after your own honour and dignity:
the less they deserve, the more merit is in your bounty.”
Most people, however, reverse this beautiful maxim, which
breathes the very soul of practical charity, and study to behave to
others in a manner suitable to the desert of those persons, while
they leave out of the question entirely the propriety and dignity of
their own conduct, as if that were a minor, and not the primary
consideration! Does not this occur every time we lose our temper? At
all events, the maxim applies with peculiar force on board ship,
where the character and conduct of every officer are daily and hourly
exposed to the searching scrutiny of a great number of persons who
have often little else to do but watch the behaviour of one another.
It may safely be asserted, indeed, that in no instance whatsoever
can we exercise any permanent or useful influence over the
opinions, feelings, or conduct of others, unless in our intercourse
with them we demean ourselves in a manner suitable to our own
station; and this, in fact, which, in the long run, is the measure of all
efficient authority, is also the principal circumstance which gives one
man the ascendency over another, his equal in talents and
information, and whose opportunities are alike. It is probably to the
same class of things that one man owes his transcendent popularity
and success in society, while another, equally gifted, and enjoying
similar opportunities, is shunned or neglected. If we hear a person
constantly finding fault—however much reason he may have on his
side—we take no pleasure in his company. We soon discover, that if
there be two things presented to his view, one which may be made
the subject of praise, the other of censure, he will catch at the
disagreeable point, and dwell upon it, to the exclusion of that which
is agreeable, although the circumstances may not be such as to
have required him to express any comparative opinion at all. And as
the taste for finding fault unfortunately extends to every thing, small
as well as great, constant food is sure to be furnished, at every turn,
to supply this disparaging appetite. If the sky be bright and clear, the
growler reminds you that the streets are dirty under foot;—if the
company be well selected, the dinner good, the music choice, and all
things gay and cheerful, he forces upon your attention the closeness
of the rooms, the awkward dress of one of the party, or the want of
tune in one of the strings of the harp. In speaking of the qualities of a
friend, your true snarler is certain to pick out the faults, to dash the
merits; and even when talking of himself, he dwells with a morbid
pleasure on his want of success in society, his losses in fortune, and
his scanty hopes of doing any better in future. The sunshine of day is
pale moonlight to such a man. If he sees a Sir Joshua, it is sure to
be faded;—the composition and execution he takes care not to look
at. If he hears of a great warrior or statesman, whose exploits have
won the applause of the whole world, he qualifies the admiration by
reference to some early failure of the great man. In short, when we
find ourselves in such a person’s company, we feel certain that the
bad side of every thing will inevitably be exposed to us. And what is
the result? Do we not shun him? And if we should have the means of
introducing him to others, or of putting him into a situation to benefit
himself and the public, are we not shy of trusting him with a degree
of power which he appears determined shall not be productive of
good?
The truth is, that by an involuntary process of the mind, we come
to judge of others, not nearly so much by direct examination as by
means of the reflected light which is sent back from the objects
surrounding them. If we observe, therefore, that a man’s general
taste is to find fault rather than to be pleased, we inevitably form the
conclusion that he is really not worth pleasing; and as he is not likely
to gratify others, we keep him, as much as we can, out of the way of
those we esteem.
In very many cases, however, probably in most cases, this temper
is merely a habit, and may, at bottom, often be quite unsuitable to
the real character. So much so, that if the opposite practice, from
whatever motive, be adopted by the same person, even where the
disposition may fundamentally not be good, the result will often be a
thousand times more amiable and useful, not only to the party
himself, but to all those with whom he has any dealings; and his
companionship will then be courted, instead of being shunned, as it
had been before.
In the free and open world of busy life, men are generally made so
fully sensible, sooner or later, of the truth of these maxims, that few
of the growling tribe are ever known to advance far in life. But on
board ship, where the distinctions of rank are strongly marked, and
the measure of each man’s authority exactly determined by
established laws and usages, officers are frequently much too slow
to discover that the principles above adverted to are applicable to
their own case; and thus they sometimes fling away advantages of
the highest price, which lie easily within their reach, and adopt
instead the cold, stern, and often inefficient operations of mere
technical discipline.
This very technical discipline, indeed, like any other machinery, is
admirable if well worked, but useless if its powers are misapplied. It
is not the mere elastic force of the steam that gives impulse to the
engine, but a due regulation of that elasticity. So it is with the use of
that mysterious, I had almost said magical sort of power, by which
the operations of moral discipline are carried on, especially at sea,
where the different component parts of the machine are so closely
fitted to one another, and made to act in such uniform order, that no
one part can go far wrong without deranging the whole.
I would fain, however, avoid narrowing the principle to any walk of
life, though its operation may be more obvious afloat than on shore.
And any young person, just setting out in the world, whatever his
profession be, will do well to recollect, that his own eventual
success, as well as happiness in the mean time, will mainly depend
upon his resolute determination to acquire the habit of being pleased
with what he meets, rather than of being sharp-sighted in the
discovery of what is disagreeable. I may add, that there is little or no
danger of the habit recommended degenerating into duplicity; for, in
order to its being either useful in the long run, or even agreeable at
the moment, its practice, like every thing else that is good, must be
guided throughout by sterling principle.
CHAPTER VII.
GEOLOGY—NAUTICAL SQUABBLES.

About this period I began to dabble a little in geology, for which


science I had acquired a taste by inheritance, and, in some degree,
from companionship with more than one of the Scottish school, who,
at the beginning of this century, were considered more than half-
cracked, merely for supporting the igneous theory of Dr. Hutton,
which, with certain limitations and extensions, and after thirty years
of controversy, experiment, and observation, appears to be now
pretty generally adopted. Sailors, indeed, have excellent
opportunities of making geological observations, for they have the
advantage of seeing Nature, as it were, with her face washed, more
frequently than most other observers; and can seldom visit any
coast, new or old, without having it in their power to bring off
something interesting to inquirers in this branch of knowledge. That
is, supposing they have eyes to see, and capacity to describe, what
meets their observation. Some people cannot go beyond a single
fact or two actually lying under their very noses; and you might as
well expect them to fly as to combine these particulars, or to apply
them to the purposes of science at large. Others, again, from the
same want of accurate comprehension, or from sheer mental
indolence, jump at once from the most trifling local circumstances to
the broadest and most unwarranted generalisations.
It would be difficult, if not quite impossible, by dint of any number
of precepts, to drive geology, or any other kind of instruction, into the
noddles of some folks; so that it will often seem an even chance with
a blockhead, whether, when he is obliged to think, he will generalise
too much or too little. I remember, for example, once lying at anchor,
for some weeks, in the harbour of Vigo, on the west coast of Spain,
during which time, for a piece of fun, the first lieutenant desired one
of the youngsters on board to write a letter to his friends at home.
“What in the world, sir, am I to say?” asked the noodle of a fellow,
after pondering over the subject for a long time.
“Say?—Why, describe the country, and the manners of the people
—tell how they behaved to you.”
To work went the youth, sorely bothered; and though he had been
on shore many times, he could extract nothing from his memory. The
first lieutenant, however, who was inexorable, insisted upon the letter
being written, and locked him up in his cabin till he intimated, by a
certain signal, that the epistle was ready for inspection. The following
was the result of four hours’ painful labour:—
“All Spain is hilly—so is this. The natives all wear wooden shoes,
and they are all a set of brutes, of which I take this to witness, that
one of them called me a Picaroon.
“I remain, &c.”
Although geology be a topic often intensely interesting on the spot,
it is not always easy to give it this character to people at a distance,
who care very little whether the world has been baked or boiled, or
both, or neither. Most persons, indeed, remain all their lives quite
indifferent whether the globe has come into its present shape by
what is called Chance, that is, I suppose, by means which we cannot
investigate, and can only guess at,—or whether its various changes
are susceptible of philosophical examination, and their history of
being recorded with more or less precision. The sound geologist of
the present day, it will be observed, professes to have nothing to do
with the origin of things, but merely investigates the various physical
revolutions which have taken place on the earth’s surface, by the
instrumentality of natural causes. The great charm of this fascinating
science, accordingly, though it may be difficult to say why, consists in
the manner in which the Reason and the Imagination are brought
together, in regions where two such travellers could hardly have
been expected to meet.
Many practical and popular questions also mingle themselves up
with the scientific inquiries of geology. I remember, for instance, even
when a boy, taking a great interest, on this account, in the plaster of
Paris quarries of Nova Scotia. This formation shews itself generally
above ground, and is of a dingy white colour, the parts exposed to
the air being crumbly or decomposed. The workmen having removed
the superincumbent earth, and the rotten rock, as they call it, blast
the solid gypsum with gunpowder, and, having broken it into blocks
sufficiently small to be handled, sell it to the American dealers. A
number of vessels are daily employed in carrying it to New York,
Philadelphia, and Baltimore. In my early notes, I find it gravely
stated, as a thing generally understood, that most of this gypsum
was sold to the millers in the United States, for the purpose of
adulterating their flour! Prejudice apart, however, the fact is quite the
reverse of what I believed in my youth; for, by an ingenious system
of regulations, the Americans contrive that the best quality should
have an advantage over bad in leaving the country. Such, indeed, is
the success of these measures, that I can safely say I never saw a
barrel of it that was not excellent.
Besides this capital flour, the Americans export biscuit of a
delicious quality to all parts of the globe; and those only who have
known the amount of discomfort produced by living on the
‘remainder biscuit after a voyage,’ perhaps not good of its kind
originally, can justly appreciate the luxury of opening a barrel of
crackers from New York! By the way, it is a curious and not
unimportant fact in nautical affairs, though only discovered of late
years, that the best way to keep this description of bread is to
exclude the air from it as much as possible. In former times, and
even for some years after I entered the navy, the practice was, to
open the bread-room frequently, and, by means of funnels made of
canvass, called windsails, to force the external air amongst the
biscuit, in order, as was supposed, to keep it sweet and fresh.
Nothing, it now appears, could have been devised more destructive
to it; and the reason is easily explained. It is only in fine weather that
this operation can be performed, at which seasons the external air is
generally many degrees hotter than the atmosphere of the bread-
room, which, from being low down in the ship, acquires, like a cellar,
a pretty uniform temperature. The outer air, from its warmth, and
from sweeping along the surface of the sea, is at all times charged
with a considerable degree of vapour, the moisture of which is sure
to be deposited upon any body it comes in contact with, colder than
the air which bears it along. Consequently, the biscuit, when
exposed to these humid currents, is rendered damp, and the process
of decay, instead of being retarded, is rapidly assisted by the
ventilation. This ancient system of airing is now so entirely exploded,
that in some ships the biscuit is placed in separate closed cases, in
which it is packed like slates, with great care, and the covers are
then caulked or sealed down. By this contrivance, no more biscuit
need be exposed than is absolutely necessary for the immediate
consumption of the crew. If I am not mistaken, this is the general
practice in American men-of-war, and it certainly ought to be adopted
by us.
I remember once, when sailing in the Pacific Ocean, about a
couple of hundred leagues to the south of the coast of Peru, falling in
with a ship, and buying some American biscuit which had been more
than a year from home. It was enclosed in a new wine puncheon,
which was, of course, perfectly air-tight. When we opened it, the
biscuit smelled as fresh and new as if it had been taken from the
oven only the day before. Even its flavour and crispness were
preserved so entire, that I thought we should never have done
cranching it.
We were not particularly fortunate in making many captures on the
Halifax station, in our early cruises, after the war broke out. But the
change which the renewal of hostilities made in our habits was great.
Instead of idly rotting in harbour, our ship was now always at sea, on
the look-out,—a degree of vigilance, which, as will be seen, had its
reward in due season. In the meantime, we discovered that a
midshipman’s life was full of interest and curiosity, especially to those
who thirsted to see new countries and new climates. Of this matter of
climate, I find a characteristic enough touch in one of my early
letters.
“We have been on a cruise for many months; but we did not take a
single prize, although all the rest of the ships of the station have
been making captures. I hope we shall be more fortunate next time,
as we intend to go to a better place. Our last cruise carried us a long
way to the southward, where the weather was so very hot, that it
became impossible to do any thing in comfort, night or day. In the
night time we could hardly sleep, and in the day we were scorched
by the sun. When our candles were lighted, they melted away by
degrees, and often tumbled on the table by their own weight, or,
perhaps, fell plump into the victuals!”
Even at this distance of time, I have a most painfully distinct
recollection of these dirty tallow candles in the midshipmen’s birth—
dips, I think, they were called—smelling of mutton fat, and throwing
up a column of smoke like that from a steamboat’s chimney. These
‘glims’ yielded but little light, by reason, possibly, of a huge wick
occupying more than half the area of the flame, and demanding the
incessant application of our big-bellied snuffers to make the
darkness visible.
This, in its turn, reminds me of a piece of cock-pit manners, which
truth obliges me to divulge, although, certainly, not very much to our
credit. It was the duty of the unfortunate wight who sat nearest the
candle—grievously misnamed the ‘light’—to snuff off its monstrous
cauliflower of a head from time to time; and certainly his office was
no sinecure. Sometimes, however, either from being too much
absorbed in his book, or from his hand being tired, he might forget to
‘top the glim,’ as it was called—glim being, I suppose, a contraction
of the too obvious word glimmer. On these occasions of neglect,
when things were returning fast to their primeval darkness, any one
of the company was entitled to call out “Top!” upon which all the rest
were bound to vociferate the same word, and he who was the last to
call out “Top!” was exposed to one of the following disagreeable
alternatives—either to get up and snuff the candle, at whatever
distance he might be seated, or to have the burning snuff thrown in
his face by any one who was within reach, and chose to pinch it off
with his finger and thumb. It is true there was always some trouble in
this operation, and some little risk of burning the fingers, to say
nothing of the danger to His Majesty’s ship; but the delightful task of
teaching a messmate good breeding, by tossing a handful of burning
tallow-candle snuff in his eyes, was, of course, a happiness too great
to be resisted.

You might also like