Nothing Special   »   [go: up one dir, main page]

(PDF Download) Thinking About Statistics: The Philosophical Foundations 1st Edition Jun Otsuka Fulll Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Full download test bank at ebookmeta.

com

Thinking about Statistics: the Philosophical


Foundations 1st Edition Jun Otsuka

For dowload this book click LINK or Button below

https://ebookmeta.com/product/thinking-about-
statistics-the-philosophical-foundations-1st-
edition-jun-otsuka/

OR CLICK BUTTON

DOWLOAD EBOOK

Download More ebooks from https://ebookmeta.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Thinking About Art 1st Edition Penny Huntsman

https://ebookmeta.com/product/thinking-about-art-1st-edition-
penny-huntsman/

Primary Mathematics 3A Hoerst

https://ebookmeta.com/product/primary-mathematics-3a-hoerst/

Thinking about Women 11th Edition Margaret L. Andersen

https://ebookmeta.com/product/thinking-about-women-11th-edition-
margaret-l-andersen/

We Who Are Dark The Philosophical Foundations of Black


Solidarity 1st Edition Tommie Shelby

https://ebookmeta.com/product/we-who-are-dark-the-philosophical-
foundations-of-black-solidarity-1st-edition-tommie-shelby/
For The Common Good: Philosophical Foundations Of
Research Ethics 1st Edition Alex John London

https://ebookmeta.com/product/for-the-common-good-philosophical-
foundations-of-research-ethics-1st-edition-alex-john-london/

Historical and Philosophical Foundations of European


Legal Culture 1st Edition Dawid Bunikowski

https://ebookmeta.com/product/historical-and-philosophical-
foundations-of-european-legal-culture-1st-edition-dawid-
bunikowski/

Thinking Critically About Child Development 4th Edition


Jean Mercer

https://ebookmeta.com/product/thinking-critically-about-child-
development-4th-edition-jean-mercer/

Philosophy of Social Science: The Philosophical


Foundations of Social Thought 3rd Edition Ted Benton

https://ebookmeta.com/product/philosophy-of-social-science-the-
philosophical-foundations-of-social-thought-3rd-edition-ted-
benton/

Origins and Varieties of Logicism On the Logico


Philosophical Foundations of Mathematics 1st Edition
Francesca Boccuni

https://ebookmeta.com/product/origins-and-varieties-of-logicism-
on-the-logico-philosophical-foundations-of-mathematics-1st-
edition-francesca-boccuni/
THINKING ABOUT STATISTICS

Simply stated, this book bridges the gap between statistics and philosophy. It
does this by delineating the conceptual cores of various statistical methodologies
(Bayesian/frequentist statistics, model selection, machine learning, causal
inference, etc.) and drawing out their philosophical implications. Portraying
statistical inference as an epistemic endeavor to justify hypotheses about a
probabilistic model of a given empirical problem, the book explains the role of
ontological, semantic, and epistemological assumptions that make such inductive
inference possible. From this perspective, various statistical methodologies are
characterized by their epistemological nature: Bayesian statistics by internalist
epistemology, classical statistics by externalist epistemology, model selection
by pragmatist epistemology, and deep learning by virtue epistemology.
Another highlight of the book is its analysis of the ontological assumptions
that underpin statistical reasoning, such as the uniformity of nature, natural
kinds, real patterns, possible worlds, causal structures, etc. Moreover, recent
developments in deep learning indicate that machines are carving out their own
“ontology” (representations) from data, and better understanding this—a key
objective of the book—is crucial for improving these machines’ performance
and intelligibility.

Key Features
• Without assuming any prior knowledge of statistics, discusses philosophical
aspects of traditional as well as cutting-edge statistical methodologies.
• Draws parallels between various methods of statistics and philosophical
epistemology, revealing previously ignored connections between the two
disciplines.
• Written for students, researchers, and professionals in a wide range of felds,
including philosophy, biology, medicine, statistics and other social sciences,
and business.
• Originally published in Japanese with widespread success, has been translated
into English by the author.

Jun Otsuka is Associate Professor of Philosophy at Kyoto University and a visiting


researcher at the RIKEN Center for Advanced Intelligence Project in Saitama,
Japan. He is the author of The Role of Mathematics in Evolutionary Theory (Cambridge
UP, 2019).
THINKING ABOUT
STATISTICS
The Philosophical Foundations

Jun Otsuka
Designed cover image: © Jorg Greuel/Getty Images
First published in English 2023
by Routledge
605 Third Avenue, New York, NY 10158
and by Routledge
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2023 Jun Otsuka
The right of Jun Otsuka to be identifed as author of this work has been
asserted in accordance with sections 77 and 78 of the Copyright, Designs
and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced
or utilised in any form or by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying and recording,
or in any information storage or retrieval system, without permission in
writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or
registered trademarks, and are used only for identifcation and explanation
without intent to infringe.
Originally published in Japanese by the University of Nagoya Press, Japan.
ISBN: 978-1-032-33310-6 (hbk)
ISBN: 978-1-032-32610-8 (pbk)
ISBN: 978-1-003-31906-1 (ebk)
DOI: 10.4324/9781003319061
Typeset in Bembo
by Apex CoVantage, LLC
For my parents, Yuzuru and Kuniko Otsuka
CONTENTS

Preface to the English Edition x

Introduction 1
What Is This Book About? 1
The Structure of the Book 4

1 The Paradigm of Modern Statistics 10


1.1 Descriptive Statistics 10
1.1.1 Sample Statistics 11
1.1.2 Descriptive Statistics as “Economy of Thought” 13
1.1.3 Empiricism, Positivism, and the Problem of Induction 16
1.2 Inferential Statistics 17
1.2.1 Probability Models 18
1.2.2 Random Variables and Probability Distributions 21
1.2.3 Statistical Models 27
1.2.4 The Worldview of Inferential Statistics and
“Probabilistic Kinds” 34
Further Reading 38

2 Bayesian Statistics 41
2.1 The Semantics of Bayesian Statistics 41
2.2 Bayesian Inference 46
2.2.1 Confrmation and Disconfrmation of Hypotheses 47
2.2.2 Infnite Hypotheses 48
2.2.3 Predictions 50
viii Contents

2.3 Philosophy of Bayesian Statistics 51


2.3.1 Bayesian Statistics as Inductive Logic 51
2.3.2 Bayesian Statistics as Internalist Epistemology 53
2.3.3 Problems with Internalist Epistemology 57
2.3.4 Summary: Epistemological Implications
of Bayesian Statistics 70
Further Reading 72

3 Classical Statistics 74
3.1 Frequentist Semantics 75
3.2 Theories of Testing 79
3.2.1 Falsifcation of Stochastic Hypotheses 79
3.2.2 The Logic of Statistical Testing 80
3.2.3 Constructing a Test 81
3.2.4 Sample Size 84
3.3 Philosophy of Classical Statistics 85
3.3.1 Testing as Inductive Behavior 85
3.3.2 Classical Statistics as Externalist Epistemology 87
3.3.3 Epistemic Problems of Frequentism 94
3.3.4 Summary: Beyond the Bayesian vs. Frequentist War 104
Further Reading 106

4 Model Selection and Machine Learning 109


4.1 The Maximum Likelihood Method and Model Fitting 109
4.2 Model Selection 112
4.2.1 Regression Models and the Motivation for Model
Selection 112
4.2.2 A Model’s Likelihood and Overftting 114
4.2.3 Akaike Information Criterion 115
4.2.4 Philosophical Implications of AIC 118
4.3 Deep Learning 124
4.3.1 The Structure of Deep Neural Networks 124
4.3.2 Training Neural Networks 126
4.4 Philosophical Implications of Deep Learning 129
4.4.1 Statistics as Pragmatist Epistemology 129
4.4.2 The Epistemic Virtue of Machines 131
4.4.3 Philosophical Implications of Deep Learning 136
Further Reading 142

5 Causal Inference 144


5.1 The Regularity Theory and Regression Analysis 145
5.2 The Counterfactual Approach 149
5.2.1 The Semantics of the Counterfactual Theory 149
5.2.2 The Epistemology of Counterfactual Causation 151
Contents ix

5.3 Structural Causal Models 158


5.3.1 Causal Graphs 159
5.3.2 Interventions and Back-Door Criteria 162
5.3.3 Causal Discovery 165
5.4 Philosophical Implications of Statistical Causal Inference 167
Further Reading 171

6 The Ontology, Semantics, and Epistemology of Statistics 172


6.1 The Ontology of Statistics 172
6.2 The Semantics of Statistics 176
6.3 The Epistemology of Statistics 179
6.4 In Lieu of a Conclusion 181

Bibliography 183
Index 189
PREFACE TO THE ENGLISH EDITION

This book is the English edition of a book originally published in Japanese under
the title Tokeigaku wo Tetsugaku Suru (Philosophizing About Statistics), by the University
of Nagoya Press. Instead of composing a word-to-word translation, I took this
occasion to revise the whole book by incorporating feedback to the original edi-
tion and adding some new paragraphs, so it might be more appropriate to call it
a rewrite rather than a translation. I also replaced some references and book guides
at the end of the chapters with those more accessible to English readers.
Translating your own writing into a language of which you don’t have a perfect
command is a painful experience. It was made possible only by close collaboration
with Jimmy Aames, who went through every sentence and checked not only my
English but also the content. Needless to say, however, I am responsible for any errors
that may remain. I also owe a great debt to Yukito Iba, Yoichi Matsuzaka, Yusaku
Ohkubo, Yusuke Ono, Kentaro Shimatani, Shohei Shimizu, and Takeshi Tejima for
their comments on the original Japanese manuscript and book, and Donald Gillies,
Clark Glymour, Samuel Mortimer, and an anonymous reviewer for Routledge for
their feedback on the English manuscript, all of which led to many improvements.
This book, like all of my other works, was made possible by the support of
my mentors, teachers, and friends, including but not limited to: Steen Anders-
son, Yasuo Deguchi, Naoya Fujikawa, Takehiko Hayashi, Chunfeng Huang,
Kunitake Ito, Manabu Kuroki, Lisa Lloyd, Guilherme Rocha, Robert Rose,
and Tomohiro Shinozaki. I am also grateful to Kenji Kodate and his colleagues
at the University of Nagoya Press, and Andrew Beck and his colleagues at
Routledge, for turning the manuscript into books.
Finally and most of all, I am grateful to my family, Akiko and Midori Otsuka,
for their moral support during my writing efectively two books in a row in
the midst of the global upheaval caused by the COVID-19 pandemic.
INTRODUCTION

What Is This Book About?


This book explores the intersection between statistics and philosophy, with the
aim of introducing philosophy to data scientists and data science to philosophers. By
“data science” I am not referring to specifc disciplines such as statistics or
machine learning research; rather, I am using the term to encompass all scientifc
as well as practical activities that rely on quantitative data to make inferences
and judgments. But why would such a practical science have anything to do
with philosophy, often caricatured as empty armchair speculation? Statistics is
usually regarded as a rigid system of inferences based on rigorous mathematics,
with no room for vague and imprecise philosophical ideologies. A philosophi-
cally minded person, on the other hand, might dismiss statistics as merely a
practical tool that is utterly useless in tackling deep and inefable philosophical
mysteries.
The primary aim of this book is to dispel these kinds of misconceptions.
Statistics today enjoys a privileged role as the method of deriving scientifc
conclusions from observed data. For better or worse, in most popular and sci-
entifc articles, “scientifcally proven” is taken to be synonymous with “approved
by a proper statistical procedure.” But on what theoretical ground is statistics
able to play, or at least expected to play, such a privileged role? The justifcation
of course draws its force from sophisticated mathematical machinery, but how
is such a mathematical framework able to justify scientifc—that is, empirical—
knowledge in the frst place? This is a philosophical question par excellence, and
various statistical methods, implicitly or explicitly, have some philosophical
intuitions at their root. These philosophical intuitions are seldom featured in
common statistics textbooks, partly because they do not provide any extra tools

DOI: 10.4324/9781003319061-1
2 Introduction

that readers could use to analyze data they collect for their theses or research
projects. However, understanding the philosophical intuitions that lie behind
the various statistical methods, such as Bayesian statistics and hypothesis testing,
will help one get a grip on their inferential characteristics and make sense of
the conclusions obtained from these methods, and thereby become more con-
scious and responsible about what one is really doing with statistics. Moreover,
statistics is by no means a monolith: it comprises a variety of methods and
theories, from classical frequentist and Bayesian statistics to the rapidly develop-
ing felds of machine learning research, information theory, and causal inference.
It goes without saying that the proper application of these techniques demands
a frm grasp of their mathematical foundations. At the same time, however, they
also involve philosophical intuitions that cannot be reduced to mathematical
proofs. These intuitions prescribe, often implicitly, how the world under inves-
tigation is structured and how one can make inferences about this world. Or,
to use the language of the present book, each statistical method embodies a
distinct approach to inductive inference, based on its own characteristic ontology
and epistemology. Understanding these ideological backgrounds proves essential
in the choice of an appropriate method vis-à-vis a given problem and for the
correct interpretation of its results, i.e., in making sound inferences rather than
falling back on the routine application of ready-made statistical packages. This
is why I believe philosophical thinking, despite its apparent irrelevance, can be
useful for data analysis.
But, then, what is the point for a philosopher to learn statistics? The standard
philosophy curriculum in Japanese and American universities is mostly
logic-oriented and does not include much training in statistics, with the
possible exception of some basic probability calculus under the name of
“inductive logic.” Partly because of this, statistics is not in most philosophers’
basic toolbox. I fnd this very unfortunate, because statistics is like an ore
vein that is rich in fascinating conceptual problems of all kinds. One of the
central problems of philosophy from the era of Socrates is: how can we
acquire episteme, or true knowledge? This question has shaped the long
tradition of epistemology that runs through the modern philosophers Descartes,
Hume, and Kant, leading up to today’s analytic philosophy. In the course of
its history, this question has become entwined with various ontological and/
or metaphysical issues such as the assumption of the uniformity of nature,
the problem of causality, natural kinds, and possible worlds, to name just a
few. As the present book aims to show, statistics is the modern scientifc
variant of philosophical epistemology that comprises all these themes. That
is, statistics is a scientifc epistemology that rests upon certain ontological
assumptions. Therefore, no one working on epistemological problems today
can aford to ignore the impressive development and success of statistics in
the past century. Indeed, as we will see, statistics and contemporary
epistemology share not only common objectives and interests; there is also
Introduction 3

a remarkable parallelism in their methodologies. Attending to this parallelism


will provide a fruitful perspective for tackling various issues in epistemology
and philosophy of science.
Given what has been said thus far, a reader might expect that this book is
intended as an introduction to the philosophy of statistics in general. It is not,
for two reasons. First, this book does not pretend to introduce the reader to
the feld of the philosophy of statistics, a well-established branch of contemporary
philosophy with a wealth of discussions concerning the theoretical ground of
inductive inference, interpretations of probability, the everlasting battle between
Bayesian and frequentist statistics, and so forth (Bandyopadhyay and Forster
2010). While these are all important and interesting topics, going through them
would make a huge volume, and in any case far exceeds the author’s capability.
Moreover, as these discussions often tend to be highly technical and assume
familiarity with both philosophy and statistics, non-specialists may fnd it difcult
to follow or keep motivated. Some of these topics are of course covered in this
book, and in the case of others I will point to the relevant literature. But instead
of trying to cover all these traditional topics, this book cuts into philosophical
issues in statistics with my own approach, which I will explain in a moment.
Thus readers should keep in mind that this book is not intended as a textbook-
style exposition of the standard views in the philosophy of statistics.
The second reason why this book is not entitled An Introduction to the Phi-
losophy of Statistics is that it does not aim to be an “introduction” in the usual
sense of the term. The Japanese word for introduction literally means “to enter
the gate,” with the implication that a reader visits a particular topic and stays
there as a guest for a while (imagine visiting a temple) in order to appreciate,
experience, and learn its internal atmosphere and architectural art. This book,
however, is not a well-mannered tour guide who quietly stays at one topic,
either statistics or philosophy. It is indeed a restless traveler, entering the gate
of statistics, quickly leaving and entering philosophy from a diferent gate, only
to be found in the living room of statistics at the next moment. At any rate,
the goal of this book is not to make the reader profcient in particular statistical
tools or philosophical ideas. This does not mean that it presupposes prior famil-
iarity with statistics or philosophy: on the contrary, this book is designed to be
as self-contained as possible, providing plain explanations for every statistical
technique and philosophical concept at their frst appearance (so experts may
well want to skip these introductory parts). The aim of these explanations,
however, is not to make the reader a master of the techniques and ideas them-
selves; rather, they are meant to elucidate the conceptual relationships among
these techniques and ideas. Throughout this book we will ask questions like:
how is a particular statistical issue discussed in the context of philosophy? How
does a particular philosophical concept contribute to our understanding of
statistical thinking? Through such questions, this book aims to bridge statistics
and philosophy and reveal the conceptual parallelism between them. Because
4 Introduction

of this interdisciplinary character, this book is not entitled “Introduction” and


is not intended to be read as such. That is, this book does not pretend to train
the reader to become a data scientist or philosopher. Rather, this is a book for
border-crossers: it tempts the data analyst to become a little bit of a philosopher,
and the philosophy lover to become a little bit of a data scientist.

The Structure of the Book


What kind of topics, then, are covered in this book? This book may be likened
to a fabric, woven with philosophy as its warp and statistics as its weft. The
philosophy warp consists of three threads: ontology, semantics, and epistemology.
Ontology is the branch of philosophy that studies the real nature of things exist-
ing in the world. Notable historical examples include the Aristotelian theory
of the four elements, according to which all subcelestial substances are composed
from the basic elements of fre, air, water, and earth; and the mechanical phi-
losophy of the 17th century, which aimed to reduce all physical matter to
microscopic particles. But ontology is not monopolized by philosophers. Indeed,
every scientifc theory makes its own ontological assumptions as to what kinds
of things constitute the world that it aims to investigate. The world of classical
mechanics, for example, is populated by massive bodies, while a chemist or
biologist would claim that atoms and molecules, or genes and cells, also exist
according to their worldview. We will not be concerned here with issues such
as the adequacy of these ontological claims, or which entities are more “fun-
damental” and which are “derivative.” What I am pointing out is simply the
truism that every scientifc investigation, insofar as it is an empirical undertaking,
must make clear what the study is about.
Unlike physics or biology, which have a concrete domain of study, statistics
per se is not an empirical science and thus may not seem to rely on any explicit
assumption about what exists in the world. Nevertheless, it still makes ontologi-
cal assumptions about the structure of the world in a more abstract way. What
are the entities posited by statistics? The frst and foremost thing that must exist
in statistics is obvious: data. But this is not enough—the true value of statistics,
especially its primary component known as inferential statistics, lies in its art
of inferring the unobserved from the observed. Such an inference that goes
beyond the data at hand is called induction. As the 18th-century Scottish phi-
losopher David Hume pointed out, inductive inference relies on what he called
the uniformity of nature behind the data. Inferential statistics performs predictions
and inferences by mathematically modeling this latent uniformity behind the
data (Chapter 1). These mathematical models come in various forms, with
difering shades of ontological assumptions. Some models assume more “exis-
tence” in the world than others, in order to make broader kinds of inferences
possible. Although such philosophical assumptions often go unnoticed in sta-
tistical practice, they also sometimes rear their head. For instance, questions
Introduction 5

such as “In what sense are models selected by AIC considered good?” or “Why
do we need to think about ‘possible outcomes’ in causal inference?” are onto-
logical questions par excellence. In each section of this book, we will try to reveal
the ontological assumptions that underpin a given statistical method, and consider
the implications that the method has on our ontological perspective of the
world.
Statistics thus mathematically models the world’s structure and expresses it
in probabilistic statements. But mathematics and the world are two diferent
things. In order to take such mathematical models as models of empirical phe-
nomena, we must interpret these probabilistic statements in a concrete way. For
example, what does it mean to say that the probability of a coin’s landing heads
is 0.5? How should we interpret the notorious p-value? And what kind of state
of afair is represented by the statement that a variable X causes another variable
Y? Semantics, which is the second warp thread of this book, elucidates the
meaning of statements and conceptions that we encounter in statistics.
Statistics is distinguished from pure mathematics in that its primary goal is
not the investigation of mathematical structure per se, but rather the application
of its conclusions to the actual world and concrete problems. For this purpose,
it is essential to have a frm grasp of what statistical concepts and conclusions
stand for, i.e., their semantics. However, just as statistics itself is not a monolith,
so the meaning and interpretation of its concepts are not determined uniquely
either. In this book we will see the ways various statistical concepts are under-
stood in diferent schools of statistics, along with the implications that these
various interpretations have for actual inferential practices and applications.
The third and last warp thread of this book is epistemology, which concerns
the art of correctly inferring the entities that are presupposed and interpreted
from actual data. As we noted earlier, statistics is regarded as the primary
method by which an empirical claim is given scientifc approval in today’s
society. There is a tacit social understanding that what is “proven” statistically
is likely true and can be accepted as a piece of scientifc knowledge. What
underlies this understanding is our idea that the conclusion of an appropriate
statistical method is not a lucky guess or wishful thinking; it is justifed in a
certain way. But what does it mean for a conclusion to be justifed? There
has been a long debate over the concept of justifcation in philosophical epis-
temology. Similarly, in statistics, justifcation is understood in diferent ways
depending on the context—what is to be regarded as “(statistically) certain”
or counts as statistically confrmed “knowledge” is not the same among, say,
Bayesian statistics, classical statistics, and the machine learning literature, and
the criteria are not always explicit even within each tradition. This discrepancy
stems from their respective philosophical attitudes as to how and why a priori
mathematical proofs and calculations are able to help us in solving empirical
problems like prediction and estimation. This philosophical discordance has
led to longstanding conficts among statistical paradigms, as exemplifed by the
6 Introduction

notorious battle between Bayesians and frequentists in the 20th century. It is


not my intention to fuel this smoldering debate in this book; rather, what I
want to emphasize is that this kind of discrepancy between paradigms is rooted
in the diferent ways that they understand the concept of justifcation. Keeping
this in mind is important, not in order to decide on a winner, but in order
to fully appreciate their respective frameworks and to refect on why we are
able to acquire empirical knowledge through statistical reasoning in the frst
place. As will be argued in this book, the underlying epistemology of Bayesian
statistics and that of classical testing theory are akin to internalism and exter-
nalism in contemporary epistemology, respectively. This parallelism, if it holds,
is quite intriguing, given the historical circumstance that statistics and philo-
sophical epistemology developed independently without much interaction,
despite having similar aims.
With ontology, semantics, and epistemology as our philosophical warp threads,
each chapter of this book will focus on a specifc statistical method and analyze
its philosophical implications; this will constitute the weft of this book.
Chapter 1 is a preliminary introduction to statistics without tears for those
who have no background knowledge of the subject. It reviews the basic distinc-
tion between descriptive and inferential statistics and explains the minimal math-
ematical framework necessary for understanding the remaining chapters, including
the notions of sample statistics, probability models, and families of distributions.
Furthermore, the chapter introduces the central philosophical ideas that run
through this book, namely that this mathematical framework represents an ontol-
ogy for inductive reasoning, and that each of the major statistical methods provides
an epistemological apparatus for inferring the entities thus postulated.
With this basic framework in place, Chapter 2 takes up Bayesian statistics.
After a brief review of the standard semantics of Bayesian statistics, namely the
subjective interpretation of probability, the chapter introduces Bayes’ theorem
and some examples of inductive inference based on it. The received view takes
Bayesian inference as a process of updating—through probabilistic calculations
and in accordance with evidence—an epistemic agent’s degree of belief in
hypotheses. This idea accords well with internalist epistemology, according to
which one’s beliefs are to be justifed by and only by other beliefs, via appro-
priate inferential procedures. Based on this observation, it will be pointed out
that well-known issues of Bayesian statistics, like the justifcation of prior prob-
abilities and likelihood, have exact analogues in foundationalist epistemology,
and that if such problems are to be avoided, inductive inference cannot be
confned to internal calculations of posterior probabilities but must be opened
up to holistic, extra-model considerations, through model-checking and the
evaluation of predictions.
Chapter 3 turns to so-called classical statistics, and in particular the theory of
statistical hypothesis testing. We briefy review the frequentist interpretation of
probability, which is the standard semantics of classical statistics, and then we
Introduction 7

introduce the basics of testing theory, including its key concepts like signifcance
levels and p-values, using a simple example. Statistical tests tell us whether or not
we should reject a given hypothesis, together with a certain error probability.
Contrary to a common misconception, however, they by no means tell us about
the truth value or even probability of a hypothesis. How, then, can such test results
justify scientifc hypotheses? We will seek a clue in externalist epistemology: by
appealing to a view known as reliabilism and Nozick’s tracking theory, I argue that
good tests are reliable epistemic processes, and their conclusions are therefore justi-
fed in the externalist sense. The point of this analogy is not simply to draw a
connection between statistics and philosophy, but rather to shed light on the well-
known issues of testing theory. In particular, through this lens we will see that the
misuse of p-values and the replication crisis, which have been a topic of contention
in recent years, can be understood as a problem concerning the reliability of the
testing process, and that the related criticism of classical statistics in general stems
from a suspicion about its externalist epistemological character.
While the aforementioned chapters deal with classical themes in statistics,
the fourth and ffth chapters will focus on more recent topics. The main theme
of Chapter 4 is prediction, with an emphasis on the recently developed tech-
niques of model selection and deep learning. Model selection theory provides
criteria for choosing the best among multiple models for the purpose of pre-
diction. One of its representative criteria, the Akaike Information Criterion
(AIC), shows us that a model that is too complex, even if it allows for a more
detailed and accurate description of the world, may fare worse in terms of its
predictive ability than a simpler or more coarse-grained model. This result
prompts us to reconsider the role of models in scientifc inferences, suggesting
the pragmatist idea that modeling practices should refect and depend on the
modeler’s practical purposes (such as the desired accuracy of predictions) as
well as limitations (the size of available data). On the other hand, deep learning
techniques allow us to build highly complex models, which are able to solve
predictive tasks with big data and massive computational power. The astonish-
ing success of this approach in the past decade has revolutionized scientifc
practice and our everyday life in many aspects. Despite its success, however,
deep learning models difer from traditional statistical models in that much of
their theoretical foundations and limitations remain unknown—in this respect
they are more like accumulations of engineering recipes developed through
trial and error. But in the absence of theoretical proofs, how can we trust the
outcomes or justify the conclusions of deep learning models? We will seek a
clue to this question in virtue epistemology, and argue that the reliability of a
deep learning model can be evaluated in terms of its model-specifc epistemo-
logical capability, or epistemic virtue. This perspective opens up the possibility
of employing philosophical discussions about understanding the epistemic abili-
ties of other people and species for thinking about what “understanding a deep
learning model” amounts to.
8 Introduction

Chapter 5 changes gears and deals with causal inference. Every student of
statistics knows that causality is not probability—but how are they diferent? In
the language of the present book, they correspond to distinct kinds of entities;
in other words, probabilistic inference and causal inference are rooted in difer-
ent ontologies. While predictions are inferences about this actual world, causal
inferences are inferences about possible worlds that would or could have been.
With this contrast in mind, the chapter introduces two approaches to causal
inference: counterfactual models and structural causal models. The former
encodes situations in possible worlds using special variables called potential
outcomes, and estimates a causal efect as the diference between the actual and
possible worlds. The latter represents a causal relationship as a directed graph
over variables and studies how the topological relationships among the graph’s
nodes determine probability distributions and vice versa. Crucial in both
approaches is some assumption or other concerning the relationship between,
on the one hand, the data observed in the actual world and, on the other, the
possible worlds or causal structures which, by their very nature, can never be
observed. The well-known “strongly ignorable treatment assignment” assumption
and the “causal Markov condition” are examples of bridges between these distinct
ontological levels, without which causal relationships cannot be identifed from
data. In causal inference, therefore, it is essential to keep in mind the ontological
level to which the estimand (the quantity to be estimated) belongs, and what
assumptions are at work in the estimation process.
On the basis of these considerations, the sixth and fnal chapter takes stock
of the ontological, semantic, and epistemological aspects of statistics, with a
view toward the fruitful and mutually inspiring relationship between statistics
and philosophy.
Figure 0.1 depicts the logical dependencies among the chapters. Since philo-
sophical issues tend to relate to one another, the parts of this book are written

1.1

1.2

5.1
4.1
2.1 2.2 3.1 3.2
5.2
4.3 4.2

2.3 3.3 5.3


4.4

5.4

6.1-6.4

FIGURE 0.1 Flowchart of the book


Introduction 9

in such a way that they refect as many of these organic connections as possible.
Readers who are interested in only certain portions of the book will fnd the
diagram useful for identifying relevant contexts and subsequent material. At the
end of each chapter I have included a short book guide for the interested reader.
I stress, however, that the selection is by no means exhaustive or even standard:
rather, it is a biased sample taken from a severely limited pool. There are many
good textbooks on both statistics and philosophy, so the reader is encouraged
to consult works that suit their own needs and tastes.
1
THE PARADIGM OF MODERN
STATISTICS

OK, so let’s get down to business. Statistics is, very roughly speaking, the art
of summarizing data and using this information to make inferences. This chapter
briefy reviews the ABCs of the mathematical framework that underpins these
activities. Modern statistics is divided into two parts, descriptive statistics and
inferential statistics, which we will review in turn, laying out their respective
philosophical backgrounds. Although the mathematics is kept to the bare mini-
mum, this chapter contains the highest dose of mathematical symbols in the
entire book. But there’s no need to be afraid: they’re not that complicated at
all, and understanding the mathematical details, though useful, is not an absolute
requisite for following the subsequent philosophical discussions. The most
important thing for our purpose is to grasp the ideas behind the mathematical
apparatus, so an impatient reader may just skim or skip the formulae on their
frst reading and return to the details later if necessary.

1.1 Descriptive Statistics


As its name suggests, the historical origin of statistics is closely related to the
formation of modern states. The development of modern centralized nations in
western Europe during the 18th and 19th centuries was accompanied by an
“avalanche of printed numbers” (Hacking 1990). In the name of taxation, mili-
tary service, city planning, and welfare programs, information of all kinds from
all over a country was collected by the rapidly emerging bureaucratic system
and reported to the central government in the form of printed fgures. The
food of numbers confronted policymakers with an urgent need to summarize
and extract necessary information from “big data” for decision making. It is still
a common practice today to summarize observed data in terms of its mean or

DOI: 10.4324/9781003319061-2
The Paradigm of Modern Statistics 11

variance, or to visualize the data using a plot or histogram. The whole set of
such techniques we use to summarize data and make them intelligible is called
descriptive statistics. The various indices used to summarize data are called sample
statistics or simply statistics, representative examples of which include sample
means, sample variances, and standard deviations.

1.1.1 Sample Statistics

Univariate statistics
Imagine there are n students in a classroom, and we represent their height with
a variable X. Specifc values obtained by measuring students’ height are denoted
by x1, x2, . . ., xn, where xi is the height of the ith student, so that xi = 155 if
she is 155 cm tall. If, on the other hand, we use another variable Y to denote
the age, yi = 23, say, means that the ith student is 23 years old. In general,
variables (denoted by capital letters) represent characteristics to be observed,
while their values (small letters) represent the results of the observation. A set
of observed data is called a sample.
The sample mean of variable X is the total sum of the observed values of X
divided by the sample size n:
n
x1 ° x 2 °  ° xn 1

n
˜
n ˛x .
i
i

The sample mean summarizes data by giving their “center of mass.” Another
representative index is the sample variance, defned as follows.1
n
1
˙ ˛x ° X ˝ .
2
var(X ) ˜ i
n i

In order to calculate the sample variance, we subtract the mean from each data
point, square the result, and then take their mean (we take the square so that
we count positive and negative deviations from the mean equally). Each sum-
mand measures the distance of the corresponding data point from the mean;
hence, their sum is small if the data are concentrated around the mean, and
large if they are scattered widely. The sample variance thus represents the extent
of the overall dispersion of the data.
The sample variance in a sense “exaggerates” the deviations because they get
squared in the calculation. If one wants to know the dispersion in the original
units, one can take the square root of the variance to get the standard
deviation:

n
1
˙ ˛x ° X ˝ .
2
sd( X ) ˜ var( X ) ˜ i
n i
12 The Paradigm of Modern Statistics

Multivariate Statistics
The sample statistics mentioned in the previous subsection focus on just one
aspect or variable of the data. When there is more than one variable, we are
sometimes interested in the relationship between them. We may be interested,
for example, in whether students’ height and age covary, so that older students
tend to be taller. The degree to which one variable X varies along with another
Y is measured by their sample covariance:
n
1
cov(X ,Y ) ˜
n ˙ ˛x
i
i ° X ˝˛ yi ° Y ˝ .

The idea is similar to variance, but instead of squaring the deviations of X from
its mean, for each data point we multiply the deviation of X with that of Y
and then take their mean. Since each summand in this case is a product of
deviations, one in X and the other in Y, it becomes positive when the variables
deviate in the same direction—so that x and y are both above or below their
means—and negative when the deviations are in the opposite direction—i.e.,
when one is above while the other is below their corresponding means. Sum-
ming these up, the sample covariance becomes positive if X and Y tend to
covary, and negative if they tend to vary in opposite ways.
The covariance divided by the standard deviation of each variable is called
the correlation coefcient:

cov(X ,Y )
corr( X ,Y ) = .
sd( X )sd(Y )

The correlation coefcient is always within the range −1 ≤ corr(X,Y) ≤ 1 and


is therefore useful when we want to compare the relative strength of the rela-
tionship between a pair of variables with that of another pair. When the cor-
relation coefcient of two variables is larger (or smaller) than zero, they are said
to be positively (or negatively) correlated.
Covariance and correlation are symmetric measures of the association between
two variables. But sometimes our interest is directional, and in that case it would
be useful to relate one variable to another along this direction. We may be
interested, for example, in how students’ height changes on average when they
become a year older. This is given by

cov(X ,Y )
bx ,y = ,
var(Y )

which is called the regression coefcient of X on Y and represents the change in


X per unit change in Y. The answer to the aforementioned question therefore
becomes: according to the data, the height X increases on average by bx,y for
The Paradigm of Modern Statistics 13

every increase in age Y by a year. The regression coefcient gives the slope of
the regression line. That is, it is the slope of the line that best fts the data, or,
more precisely, the line that minimizes the sum of the squared deviations from
each data point.

Discrete Variables
The features we observe need not be expressed in terms of continuous variables,
as in the case of height. We may, for example, represent the outcome of a coin
fip with the variable X, and let 1 denote heads and 0 tails. Variables like these
that do not take continuous values are called discrete variables. In this example,
the outcome of n coin fips can be represented as a sequence (x1, x2, . . ., xn),
where each xi is either 0 or 1. The mean X is the proportion of heads among
the n trials. The sample variance can be calculated in the same way, and it is
greatest when one gets an equal number of heads and tails, and zero if only
one side of the coin comes up. Thus, the sample variance measures the disper-
sion of an outcome, just as in the case of continuous variables.

1.1.2 Descriptive Statistics as “Economy of Thought”


Let us pause here and think about what all the statistical quantities we have just
defned tell us. As noted at the outset, the primary role of sample statistics is
to present large data in an easy-to-understand way, and thereby to uncover
structures or relationships that are often invisible when the data are presented
as a mere sequence of numbers. This is illustrated by Figure 1.1, the frst regres-
sion plot made by Francis Galton, the progenitor of regression analysis, in order
to study the relationship between the mean parental height (the average of the
mother’s and father’s heights) and the average height of their children among
205 families in 19th-century England. The positive slope of the regression line
tells us that the children of taller-than-average parents tend to be tall on average.
At the same time, the fact that the slope is less than one indicates that parental
height is not perfectly inherited to ofspring; in other words the children’s height
is on average not as “extreme” as their parents’ height. Galton named this phe-
nomenon regression toward the mean and worried that, just like height, extraor-
dinary talents and qualities of humankind may sink into mediocrity if left to
the course of nature. Setting aside the question of whether his worry is justifed,
Galton’s regression analysis vividly illustrates the relationship between parents’
and children’s heights, or to put it a bit dramatically, it succeeds in uncovering
a hidden regularity or law buried in the raw data.
Galton’s work well-embodies one of the Zeitgeister of his day, namely, the
positivist vision of science according to which the objective of scientifc inves-
tigation is nothing but to summarize and organize the observed data into a
coherent order. The core thesis of positivism is that scientifc discourse must
14 The Paradigm of Modern Statistics

FIGURE 1.1 Galton’s (1886) regression analysis. The vertical axis is the mean parental
height and the horizontal axis is the mean children’s height, both in
inches, while the numbers in the fgure represent the counts of the
corresponding families. The regression lines are those that are labeled
“Locus of vertical/horizontal points.”

be based on actual experience and observation. This slogan may sound quite
reasonable, or even a truism: surely, we expect the sciences be based on facts
alone and refrain from employing concepts of unobservable, supernatural
entities such as “God” or “souls”? The primary concern of positivism, how-
ever, was rather those concepts within science that appear “scientifc” at frst
sight but are nevertheless unobservable. A notable example is the concept of
the atom, which was incorporated by Boltzmann in his theory of statistical
mechanics in order to explain properties of gases such as temperature and
pressure in terms of the motion of microscopic particles and their interac-
tions. Ernst Mach, a physicist and the fery leader of the positivist movement,
attacked Boltzmann’s atomism, claiming that such concepts as “atoms” or
“forces” are no better than “God” or “souls,” in that they are utterly impos-
sible to confrm observationally, at least according to the technological stan-
dards of the time. Alleged “explanations” that invoke such unobservable
postulates do not contribute to our attaining a solid understanding of nature,
and thus should be rejected. Instead of fancying unobservable entities, genuine
scientists should devote their eforts to observing data and organizing
The Paradigm of Modern Statistics 15

them into a small number of neat and concise laws so that we can attain a
clear grasp of the phenomena—Mach promoted such an “economy of thought”
as the sole objective of science.
Mach’s vision was taken over by Galton’s successor, Karl Pearson, who laid
down the mathematical foundation of descriptive statistics. Pearson redirected
Mach’s positivist attack on unobservables to the concept of causality. Although
the idea of a causal relationship, where one event A brings about another event
B, is ubiquitous in our everyday conversations as well as scientifc discourse, a
closer inspection reveals that its empirical purport is hardly obvious. Imagine
that a moving billiard ball hits another one at rest, making the latter move. All
that one observes here, however, is just the movement of the frst ball and a
collision, followed by the movement of the second ball. When one says upon
making this observation that “the former brings about the latter,” one doesn’t
actually witness the very phenomenon of “bringing about.” As Hume had
already pointed out in the 18th century, all we observe in what we call a causal
relationship is just a constant conjunction in which the supposed cause is followed
by the supposed efect; we do not observe any “force” or the like between the
two. Nor do we fnd anything like this in numerical data: all that descriptive
statistics tells is that one variable X is correlated with another Y, or that the
slope of the regression line between them is steep; we never observe in the
data the very causal relationship where the former brings about the latter. All
this suggests, Pearson argues, that the concept of causality, along with “God”
and “atoms,” is unobservable and has no place in positivist science, where only
those concepts that have a solid empirical basis are allowed. Note that this
claim difers from the oft-made remark that causation cannot be known or
inferred from correlation. Pearson’s criticism is much stronger, in that he claims
we should not worry about causality to begin with because science has no
business with it. What we used to call “causality” should be replaced by con-
stant conjunction and redefned in terms of the more sophisticated concept of
the correlation coefcient, which can be calculated from data in an objective
and precise manner. In Pearson’s eyes, descriptive statistics provides just enough
means to achieve the positivist end of economy of thought, and the concept
of causality, being a relic of obsolete metaphysics, must be expelled from this
rigid framework.
Ontologically speaking, positivism is an extreme data monism: it claims that
the only things that genuinely “exist” in science are data that are measured in
an objective manner and concepts defned in terms of these data, while every-
thing else is mere human artifact. Any concept, to be admissible in science,
must be reducible to data in an explicit fashion or else banned from scientifc
contexts, however well it may appear to serve in an explanation. This itself is
an old idea that can be traced back to 17th- and 18th-century British empiri-
cism, which limited the source of secure knowledge to perceptual experience
alone. But erecting a scientifc methodology upon this metaphysical thesis calls
16 The Paradigm of Modern Statistics

for a more rigorous and objective reformulation of its ontological framework.


Suppose, for instance, that a follower of Hume claimed that all that exist are
constant conjunctions among events, where “constant conjunction” means that
those events tend to co-occur. This, however, is at best ambiguous and does
not tell us what it means exactly for two events to co-occur, nor does it give
us any clue as to how many co-occurrences are sufcient for us to conclude
that there is a constant conjunction. Correlation coefcients answer these ques-
tions by providing an objective criterion: two variables are related or constantly
conjunct when their correlation coefcient is close to one. Of course, there is
still some ambiguity as to how close to one it must be, but at least it ofers a
fner-grained expression that enables one to, say, compare the strength of several
conjunctive relationships by looking at their respective correlation coefcients.
Descriptive statistics thus furnishes the positivist agenda with a substantive
methodology for representing, organizing, and exploring observed raw data,
allowing us to extract meaningful relationships and laws. Pearson published his
scientifc methodology in The Grammar of Science (Pearson 1892), a title that
boldly declares his manifesto that descriptive statistics, which articulates Mach’s
data-monistic ontology in a precise language, is the canonical approach to posi-
tivist science.

1.1.3 Empiricism, Positivism, and the Problem of Induction


So far we have briefy reviewed the basics of descriptive statistics and its meth-
odological implications for the positivist vision of science as a pursuit of economy
of thought. One may ask, however, whether this positivist vision accurately
represents actual scientifc practice and its goals. Driven by the epistemic tenet
that knowledge must be built on a secure ground, the positivist philosophy
trims down the foundation of science to just those phenomena that are observ-
able and measurable, rejecting all other concepts that are irreducible to experi-
ence as metaphysical nonsense. The certainty we derive from this ascetic attitude,
however, comes with a high price. The highest is the impossibility of inductive
reasoning, already pointed out by Hume. Induction is a type of reasoning that
infers an unobserved or unknown state of afairs from given experience, obser-
vation, or data. Characterized as such, it encompasses the majority of our
inferential practices, from the mundane guesswork about whether one can fnd
an open table in the lunchtime cafeteria to the scientifc assessment of the
efcacy of a new drug based on clinical trials. In carrying out these inferences,
we implicitly assume that the unobserved phenomena to be inferred should be
similar to the observed ones that serve as the premise of our inference. Hume
called this assumption—that nature operates in the same way across past and
future—the uniformity of nature. It should be noted that this assumption of uni-
formity can never be justifed by past experiences, because our experiences are
just a historical record and by themselves contain no information about the
The Paradigm of Modern Statistics 17

yet-to-be-experienced future. This means that inductive inferences inevitably


involve an assumption that cannot be observed within or confrmed by our
experience. Hence, if we were to strictly follow the positivist standard and kick
out from scientifc investigation all such postulates that lack an empirical justi-
fcation, we at the same time lose all grounds for inductive reasoning.
The same limitation applies to descriptive statistics. In fact, the whole frame-
work of descriptive statistics does not authorize or allow us to make any predic-
tion beyond the data, because prediction lies outside the duties of descriptive
statistics, which are to summarize existing data. Galton’s regression line in Figure 1.1
by itself says nothing about the height relationships of other families not included
in his 205 samples. It is true that we can hardly resist the temptation to guess
that those unobserved families, if observed, will also be distributed near the
line. But this irresistible feeling, to use Hume’s phrase, is just our “mental habit”
and has no theoretical or empirical justifcation. According to the “grammar”
of descriptive statistics, such a prediction is no diferent from, say, a superstitious
faith in the “mystical power” of a charm that has survived numerous disasters,
and has no place in positivist scientifc discourse.
This may be a bold attitude, but as scientifc methodology, it is utterly
unsatisfactory. Granted, organizing various phenomena into a well-ordered system
and uncovering past tendencies are certainly important tasks in science. But we
expect a lot more: in particular, we expect science to provide predictions or
explanations of unobserved or unobservable phenomena. The pure positivist
framework of descriptive statistics falls short in this respect. To capture this
predictive and explanatory aspect of scientifc practice calls for a more powerful
statistical machinery, to which we turn now.

1.2 Inferential Statistics


While descriptive statistics specializes in summarizing observed data, inferential
statistics is the art of inferring and estimating unobserved phenomena. As noted
earlier, such inductive inferences cannot be justifed from the data alone; they
must presuppose what Hume called the uniformity of nature behind the data.
Inferential statistics formulates this assumed uniformity in terms of a probability
model,2 which enables a rigorous and quantitative treatment of inductive reason-
ing. Figure 1.2 illustrates the overall strategy of inferential statistics. In this
framework, data are reconstrued as samples taken from the underlying probability
model. Being a random process, each sampling should give a diferent dataset,
while the probability model itself is assumed to stay invariant over the inferential
procedure. But since the probability model is by defnition unobservable, it
must be estimated from the given data, and this estimated probability model
serves as the basis for predicting future or unobserved data (illustrated by the
dashed arrow in the fgure). Inferential statistics thus deals with the problem
of induction by introducing the probability model as a new “entity”
18 The Paradigm of Modern Statistics

Prediction

PROBABILITY PROBABILITY
MODEL P MODEL P
Uniformity
Sampling Estimation

DATA D DATA

FIGURE 1.2 Dualism of data and probability models. In inferential statistics, data are
interpreted as partial samples from a probability model, which is not
directly observed but only inferred inductively from the data. The
assumption that this probability model remains uniform underlies pre-
dictions from the observed to the unobserved. Whereas the concepts
of descriptive statistics, i.e., sample statistics, describe the data-world
below, those of probability theory (see Section 1.2.1) describe the world
above.

behind the data.3 In other words, it attacks Hume’s problem with the dualist
ontology of data and models, a richer ontology than that of positivism, which
restricts the realm of scientifc entities only to data.
But how is this conceptual scheme put into actual practice? Two things
are necessary in order for this ontological framework to function as a method
of scientifc inference. First, we need a mathematical machinery to precisely
describe the newly introduced entity, the probability model. Second, we
need a defnite epistemological method to estimate the model thus assumed
from the data. We will focus on the mathematical properties of probability
models in the rest of this chapter, and save the latter question for the
following chapters.

1.2.1 Probability Models


Probability models are described in the language of probability theory. Note that
so far the word “probability” has not appeared in our discussion of descriptive
statistics in the previous section. This is because the concept of probability
familiar in our everyday lives belongs not to data, but to the world behind it
from which we (supposedly) take the data. This world-as-source is called a
population or sample space. Roughly speaking, it is the collection of all possible
outcomes that can happen in a given trial, observation, or experiment. For
instance, the sample space for a trial of rolling a die once consists of Ω = {1,
2, 3, 4, 5, 6}, and if we roll it twice, it will be the product Ω × Ω. An election
forecast, on the other hand, would take all possible voting behaviors of all the
voters as its sample space. What we call events are subsets of a sample space.
The Paradigm of Modern Statistics 19

The event of getting an even number by rolling a die is {2, 4, 6}, which is a
subset of Ω = {1, 2, 3, 4, 5, 6}; likewise, the event of getting the same number
in two rolls is {(1, 1), (2, 2), . . ., (6, 6)} ⊂ Ω × Ω, which again is a subset of
the corresponding sample space. In what follows, we denote the sample space
by Ω and its subsets (i.e., events) by roman capital letters A, B, and so on. As
we will see shortly, a probability measures the size of the “area” these events
occupy within the sample space. But due to some mathematically complicated
reasons (the details do not concern us here), we cannot count arbitrary subsets
as “events,” because some are not measurable. There are conditions or rules that
a subset must satisfy in order for it to count as a bona fde event to which we
can assign a specifc probability value. The rules are given by the following
three axioms:

R1 The empty set ∅ is an event.


R2 If a subset A ∈ Ω is an event, so is its compliment Ac = Ω/A.
R3 If subsets A1, A2, . . . are events, so is their union ∪ i Ai .4

Nothing complicated—these rules require merely that if something is an event,


then its negation must also be an event, and if there are multiple events, then
their combination (union) must also be regarded as an event. A set of events
that satisfes these conditions is called a σ-algebra, but we do not have to worry
about it in this book. In the previous example of rolling a die, the power set
(i.e., the set of all subsets) of the sample space Ω gives a σ-algebra satisfying the
aforementioned three conditions, and in this case any subset in the sample space
counts as an event.5
The probability of an event, as mentioned earlier, is its “size” in the sample
space. The “size” is measured by a probability function P that satisfes the follow-
ing three axioms.6

Probability Axioms
A1 0 ≤ P(A) ≤ 1 for any event A.
A2 P(Ω) = 1.
A3 If events A1, A2, . . . are mutually exclusive (i.e., they do not overlap), then
P ° A1 ˜ A2 ˜ . . .˛ ˝ P ° A1 ˛ ˙ P ° A2 ˛ ˙ . . . .

Axiom 1 states that the probability of any event (a subset of the sample space)
falls within the range between zero and one. By Axiom 2, the probability of
the entire sample space is one. Axiom 3 stipulates that the probability of the
union of non-overlapping events/subsets is equal to the sum of their probabilities
(this axiom justifes our analogy of probabilities and sizes).
A probability model consists of the aforementioned three elements: a sample
space, a σ-algebra defned on it, and a probability function. This triplet is all
20 The Paradigm of Modern Statistics

there is to probability. From this we can derive all the theorems of probability
theory, including the following elementary facts, which the reader is encouraged
to verify:

T1 P(Ac) = 1 − P(A), where Ac is the compliment of A.


T2 For any events A, B (which need not be mutually exclusive),
P ( A ˜ B) ° P ( A ) ˛ P (B ) ˝ P ( A ˙ B ).
That is, the probability of “A or B” is equal to the sum of the probability of A
and that of B minus that of their intersection (i.e., the probability of “A and B”).
For simplicity, we hereafter write P(A, B) instead of P(A ∩ B).

Conditional Probability and Independence


Although we said that the aforementioned axioms defne all there is to probability,
a few additional defnitions will prove useful in later discussions. The conditional
probability of A given B is defned by:

P ( A, B )
P( A |B ) = .
P (B )

We may think of a conditional probability as the probability of some event (A)


given that another event (B) has occurred. In general, a probability of an event
changes by conditioning. But when it doesn’t, so that P(A|B) = P(A), we say
that A and B are independent. Independence means irrelevance: information about
B gives us no new information about A when they are independent. When A
and B are not independent, they are said to be dependent. The independence
relation satisfes the following properties, which the reader should verify using
the defnition just provided:

• Symmetry, i.e., P(A|B) = P(A) if (if and only if ) P(B|A) = P(B).


• If A and B are independent, then P(A,B) = P(A)P(B); that is, the probability
that they both hold is the product of each probability.

Marginalization and the Law of Total Probability


Suppose events B1, B2, . . ., Bn partition the sample space; that is, they are
mutually exclusive (i.e., Bi ∩ Bj = ∅ for i ≠ j) and cover the entire sample
˛ ˝
space when combined ˜ni Bi ° ˙ . Then, for any event A, we have

n
P( A) ˜ ˝P ° A,B ˛ .
i
i
The Paradigm of Modern Statistics 21

That is, A’s “area” can be reconstructed by patching together all the places
where A and Bi overlap. This is called marginalization. The terms on the right-
hand side can be rewritten using conditional probabilities to yield
n
P( A) ˜ ˝P ( A|B ) P °B ˛ ,
i
i i

which is called the law of total probability.

Bayes’ theorem
By rearranging the right-hand side of the defnition of conditional probability,
we obtain

P (B|A )P ( A )
P ( A|B ) =
P (B )

for any events A, B. This equation, called Bayes’ theorem, plays an essential role
in Bayesian statistics, which we will discuss in the next chapter.

1.2.2 Random Variables and Probability Distributions


As stated earlier, events are subsets of a sample space. These events, as they
stand, so far have no “name” and have been simply denoted by the usual subset
notation, so that the probability of the event of getting an even number by
rolling a die, say, is denoted as P({2, 4, 6}). Enumerating all the elements in
this way may not cause any inconvenience in this particular example, as it has
only three possibilities; but it may be very cumbersome when dealing with a
much larger event, say, when we want to refer to all eligible voters in the
sample space of all citizens. To resolve this issue, statisticians use random variables
to identify an event of interest within the whole sample space. Random vari-
ables are (real-valued) functions defned on a sample space to indicate properties
of objects. For example, let the random variable Y be a function that gives the
age of a person. Then Y (Homer Jay Simpson) = 40 means that Mr. Simpson
in the sample space is 40 years old. Now let’s assume that in a certain country
voting rights are given to citizens who are 18 years of age or older. Then,
using the aforementioned function, the subset “eligible voters” can be expressed
as the inverse image {ω ∈ Ω : Y(ω) ≥ 18}, that is, the set that consists of all
elements ω in the sample space Ω for which Y gives a value equal to or greater
than 18. But since this notation is still lengthy, we simply write Y ≥ 18 to
denote the subset. Likewise, if X represents height, then X = 165 stands for
{ω ∈ Ω : X(ω) = 165} and refers to the set of people who are 165 cm tall. 7
Now, imagine a trial where we randomly pick a subject from the
22 The Paradigm of Modern Statistics

whole population of citizens. With this setup and our defnition of events as
subsets of the sample space, Y ≥ 18 corresponds to the event that the selected
person is no less than 18 years old, while X = 165 corresponds to the event
that he or she is 165 cm tall.
Now, since the event identifed by a value of a random variable is a subset of
the sample space, we can assign to it a probability. The probability that the selected
person is no less than 18 years old is P(Y ≥ 18), while the probability that she is
165 cm tall is P(X = 165). In general, the probability that a random variable X
has value x is given by P(X = x). With this notation and setup, P(X = 165) =
0.03 means that the probability of selecting a person who is 165 cm tall is 3%.
Do not get confused by the double equal signs: the frst one, X = 165, is just a
label telling us to pick out the event that the selected person is 165 cm tall, while
the second equal sign is the one that actually functions as an equality connecting
both sides of the equation. But since this expression is rather repetitive, we some-
times omit the frst equal sign and simply write P(x) to denote P(X = x) when
the relevant random variable is clear from the context. In this shorthand, P(x) =
0.01 means that the probability of X having value x is 1%. We can also combine
two or more variables to narrow down an event. For example, P(x, y) stands for
P(X = x ∩ Y = y), and according to the prior interpretation, it denotes the
probability of selecting a person who is x cm tall and y years old.
What is the point of introducing random variables? In most statistical analyses,
we are interested in attributes or properties of objects, like height or age. Rep-
resenting these attributes in terms of random variables allows us to express how
the probability depends on the value of these properties; in other words, it allows
us to reconstrue the probability function as a function that assigns probability
values to specifc properties rather than events. The function that assigns probability
P(x) to value x of random variable X is called a probability distribution of X and
is denoted by P(X). Note the diference between the uppercase X and lowercase
x. P(X) is a function that can be represented by a graph or histogram that takes
X as its horizontal axis and the probability values as its vertical axis. On the other
hand, P(x) or P(X = x) (recall that the former is just a shorthand of the latter)
is a particular probability value that the function returns given the input x, and
it is represented by the height of the function/graph P(X) at x.
When there are two or more random variables X and Y, a function that assigns
the probability P(x, y) to each combination of their values x, y is called a joint
probability distribution of X and Y. This can be illustrated by a three-dimensional
plot where P(x, y) is the height at the coordinate (x, y) on the horizontal plane
X × Y. The joint distribution P(X, Y) contains all the probabilistic information
about the random variables X and Y. To extract the information of just one vari-
able, say Y, we marginalize it by taking the sum8 over all values of X:

P(y ) ˜ °P( y, x ).
x
The Paradigm of Modern Statistics 23

Calculating this probability for each Y = y yields the distribution of Y, called


a marginal probability distribution.
Probability distributions are essentially the same as the probability functions
defned in the previous section; in fact, P(x) is simply the probability value of
the event identifed by X = x. Hence, all the axioms, defnitions, and theorems
of probability theory carry over to probability distributions. For instance, the
conditional probability that someone has height x given that she is y years old
is defned as P(x|y) := P(x, y)/P(y). Likewise, when P(x|y) = P(x), or equiva-
lently, P(x, y) = P(x)P(y), the two events X = x and Y = y are independent.
This is a relationship that holds between particular values x and y. More gener-
ally, when the independence condition P(x, y) = P(x)P(y) holds for all values
x, y of the random variables X, Y, these variables are said to be independent.
Note the diference between the two: the independence of values is a rela-
tionship between concrete events, whereas the independence of random
variables is a general relationship between attributes. In terms of the previous
example, the former only claims that knowing that a person is y years old
does not give any extra clue as to whether she measures x cm or not, whereas
the latter is the much stronger claim that knowing a person’s age gives no
information whatsoever about her height—which sounds unlikely in this
particular example, but may hold for other attributes, such as height and
commuting time.

Note on Continuous Variables and Probability Densities


As we have seen in Section 1.1.1, some attributes take discrete values, while
others take continuous values. Properties like height that vary gradually may be
better represented by continuous variables that take real values. Continuous
random variables, however, require a bit of caution when we consider their
probability. Suppose X is a continuous random variable: what is the probability
that it takes a specifc real value x, i.e., P(X = x)? Whatever x is, the answer is
always zero. To see why, recall that a probability function is a measure of the
size of a subset in a sample space. An element among uncountably many ele-
ments, like a point on a real line, does not have any extension or breadth, and
so its “size” or probability must be also zero. This may make intuitive sense if
one notes that no one has a height exactly equal to 170.000 . . . cm, no matter
large a population we take. Thus, any particular value of a continuous variable
has zero probability. But even in the continuous case, a certain interval, say
between 169 and 170 cm, may have a nonzero size/probability. We can then
consider the result of successively narrowing down this interval. The probability
of an infnitely small interval around a point is called a probability density, and a
function that gives the probability density at each point x is called a probability
density function. The probability of an interval can be obtained by integrating
this function over the interval. Letting f be the probability density function of
24 The Paradigm of Modern Statistics

height, the probability that a person’s height falls within the range from 169 to
170 cm is given by:
170
P(169 ˜ X ˜ 170) ° ˛ 169
f ( x )dx.

Hence, strictly speaking, one should use probabilities for discrete random vari-
ables and probability densities for continuous ones. Nevertheless, in this book
we will abuse terminology and notation, using the word “probability” to denote
both cases, and the expression P(X = x) to denote the probability of x when
X is discrete and the probability density at x when X is continuous. Readers
who are inclined toward rigor should reinterpret terminology and notation
according to the context.

Expected Values
A probability distribution, as we saw, is a function of the values of a random
variable. Since this pertains to the “uniformity of nature” that cannot be observed
in the data, the entirety of this function is never fully revealed to us (recall that
a sample space contains not just what happened or is observed but all the pos-
sible situations that might occur). Nevertheless, one can consider values that
summarize this “true” distribution. Such values that are characteristics of the
probability distribution of a given random variable are called its expectations or
expected values.
A representative expected value is the population mean, often denoted by the
Greek letter μ and defned by:

˜° ˝x ˛ P(X ° x)
x

The population mean is the sum of the values x of X weighted by their prob-
ability P(X = x), and gives the “center of mass” of the distribution. In contrast,
its dispersion is given by the population variance σ2:

˜2 ˛ ˆ(x ˝ ° )
x
2
˙ P(X ˛ x)

Expressed in English, this is the sum of the squared deviation (x − μ)2 of each
value x of X from its population mean, weighted by the probability P(X = x).
More generally, an expected value of a random variable is defned as a
weighted sum or integral of its values (as in the case of the population mean)
or a function thereof (as in the case of the population variance, where the
function in question is the squared deviation). This operation is called “taking
the expectation” and is denoted by . For instance, the population mean is
the result of taking the expectation of X itself, i.e., ( X ), while the population
The Paradigm of Modern Statistics 25

variance is the result of taking the expectation of (X − μ)2, i.e.,  °( X ˜ ˆ )2 ˝ .


˛ ˙
Other expected values can be defned in the same manner.
At frst sight, the population mean and variance may look very much like
sample statistics, such as the sample mean and variance reviewed in Section 1.1.1.
They are indeed related, yet they are diferent in important ways. The primary
diference lies in the kind of objects they purport to describe. Recall that sample
statistics such as the sample mean are summaries of observed data or samples. In
contrast, what we are dealing with here is not the fnite data at hand but rather
their source, defned as a probability distribution on a certain sample space. The
population mean and variance describe this probability model—and in this sense
they can be thought of as extensions of the concepts of the sample mean and
variance, redefned on the entire sample space which includes not only observed
but also unobserved, and even unobservable, samples. Since we cannot measure
these samples, expected values, by their very nature, cannot be known directly.
They are, so to speak, only in the eyes of God who is able to see all there is to
know about a probability model.

The IID Condition as the Uniformity of Nature


That sums up our brief review of the probability model as a “source of data.”
In order to go beyond the given data and make inferences about unobserved
phenomena, inferential statistics needs to posit a uniform structure behind the
data. The concepts introduced in this section, such as the sample space, probability
functions, random variables, probability distributions, and expected values, are
mathematical tools we use to describe or characterize this posited structure,
namely the probability model.
This posited structure, however, is still an assumption. In order to make use
of it in inductive reasoning, inferential statistics must identify this structure. How
do we do this? As previously stated, in inferential statistics the data we observe
are interpreted as samplings from the probability model. A crucial assumption
in this process is that all the samples come from the same probability model.
That is, each piece of data must follow the same probability distribution. Another
common requirement besides this is that the sampling must be random: one
should not, for example, disproportionately pick taller individuals, or alternate
between picking tall and short individuals. This amounts to the requirement
that random variables of the same type must be independent,9 so that one can-
not predict what will come next based on any particular outcome (of course,
distinct random variables like height and age need not be independent). When
these conditions are satisfed, a set of random variables is said to be independent
and identically distributed, or IID for short.
The IID condition is a mathematical specifcation of what Hume called the
uniformity of nature. To say that nature is uniform means that whatever cir-
cumstance holds for the observed, the same circumstance will continue to hold
26 The Paradigm of Modern Statistics

for the unobserved. This is what Hume required for the possibility of inductive
reasoning, but he left the exact meaning of “sameness of circumstance” unspeci-
fed. Inferential statistics flls in this gap and elaborates Hume’s uniformity
condition into the more rigorous IID condition.10 In this reformulation, uni-
formity means that the probability model remains unchanged across observations,
and that the sampling is random so that the observation of one sample does
not afect the observation of another. Note that this kind of mathematical for-
mulation is possible only after the nature of the probability model and its
relationship with the observed data are clearly laid out. In this way, probability
theory provides us with a formal language that enables us to specify the onto-
logical prerequisites of inductive reasoning in mathematical terms.

The Law of Large Numbers and the Central Limit Theorem


The assumption of IID as a uniformity condition allows us to make inferences
about the probability model behind the data. This is most vividly illustrated
by the famous law of large numbers and the central limit theorem, both of
which are part of large sample theory, the backbone of traditional inferential
statistics.
Let us look at the law of large numbers frst. We are interested in estimating,
on the basis of observed data, the underlying probability distribution or its
expected values like the population mean or variance. Suppose, for example,
that we are interested in the mean national height. What we are able to know,
however, are only the sample statistics obtained from a fnite set of data, such
as the sample mean. As we have noted, such sample statistics are distinct from
the expected values of the probability distribution, both conceptually and in
the way they are defned. Nevertheless, it seems to be a very natural idea to
take the latter as an estimator of the former. Of course, the mean height of just
a handful of people will hardly serve as a reliable estimator; but if we have more
data and measure the height of, say, millions of people, we would feel very safe
in regarding the sample mean of this data to be a good approximation to the
true national mean. The law of large numbers backs up this common-sense
intuition with a rigorous mathematical proof that the sample mean will approach
the real mean of the population as we collect more and more data. Let the
random variables X1, X2, . . ., Xn be IID, i.e., they are mutually independent
and have the same distribution. A typical example is the height of n people
from the same population. Since these variables are IID, they have the same
population mean  ˜ X 1 ° ˛  ˜ X 2 ° ˛  ˛ ˝ . The law of large numbers then
n
states that as the sample size n approaches infnity, the sample mean X n ˜
converges in probability to the population mean μ: that is,
°X
i
i /n

˙ ˆ
lim P X n ˛ ˘ ˝  ˇ 0
n˜°
The Paradigm of Modern Statistics 27

holds for an arbitrarily small positive margin ϵ. The probability function on the
left-hand side expresses the probability that the deviation of the sample mean
X n from the population mean μ is no less than ϵ. The equation as a whole
thus states that this probability becomes zero as n approaches infnity, i.e., that
it will be certain that the sample and population means coincide with arbitrary
precision. This mathematical result provides us with the ground for roughly
identifying a sample mean obtained from numerous observations with the
population mean.
Note that our only assumption was that X1, X2, . . ., Xn are IID; there was
no restriction on the form of their distributions. This means that what is crucial
in the law of large numbers is merely the presence of uniformity, and not the
specifc form of this uniformity: even if the underlying distribution is completely
unknown, the mere fact that the samples are IID or uniform ensures that, as
we have more and more data, the probability distribution of the sample mean
will fall within a certain range around X n . But that’s not all: the distribution
of the sample mean tends toward a unique form, the “bell-shaped” normal dis-
tribution. This is the result of the famous central limit theorem. Let X1, X2, . . .,
Xn be IID variables again, this time with the population variance σ2. Then the
theorem states that as n approaches infnity, the distribution P ˜ X n ° of the
sample mean tends toward the normal distribution with mean μ and variance
σ2/n. We will return to the normal distribution after explaining the concept of
distribution families; all we need to know now is that it is a particular form of
distribution. Hence, the result here means that even though we do not know
anything about the nature of the underlying distribution, we can know that its
sample means tends toward a particular distribution as we keep sampling. What
is important here is that this result is derived from the IID condition alone.11
As in the law of large numbers, all that is required for the central limit theorem
to hold is that the data are obtained from a uniform IID process; we need not
know about the form or nature of the underlying distribution. From this
assumption alone, we can conclude that the sample mean will always converge
to the same form, namely the normal distribution. This means that we can
make inferences about the unknown probability model just by repeated sampling.
In this way, the results of large sample theory such as the central limit theorem
and law of large numbers provide us with a theoretical justifcation of the agenda
of inferential statistics, which is to make inductive inferences about the true but
unobservable distribution on the basis of fnite and partial observations.

1.2.3 Statistical Models

What Statistical Models Are


Let us take stock of our discussion so far. In order to build a framework for
inductive inference, a kind of inference that goes beyond the given data, we
28 The Paradigm of Modern Statistics

began by defning a probability model, which is the uniformity behind the data.
Then we introduced random variables as a means of picking out the events
corresponding to properties that we are interested in, and saw that they have
certain defnite distributions. Although these distributions are unknown and
unobservable, they can be characterized in terms of expected values, and we
saw that these expected values can be approached, with the aid of large sample
theory, through repeated samplings that satisfy the IID condition.
But this doesn’t settle all the inductive problems. Large sample theory
provides us only with an eschatological promise, so to speak: it guarantees
only that if we keep collecting data indefnitely, then the true distribution
will eventually be revealed. Hence, if we could keep tossing the same coin
infnitely many times, the observed ratio of heads would converge to the true
probability. It is impossible, however, to actually conduct an infnite number
of trials, since the coin will wear out and vanish before we complete the
experiment. “In the long run we are all dead,” as Keynes said, and given that
we can collect only fnite and in many cases modest-sized data which fall far
short of “large samples” in our lifetime (or the time we can spend on a par-
ticular inductive problem), a statistical method, if it is to be serviceable, must
be able to draw inferences from meager inputs. Such inferences may well be
fallible and may not give probability-one certainty as the law of large numbers
does. What becomes important, therefore, is a framework that enables us to
carry out inductive inferences as accurately as possible and to evaluate the
reliability of these inferences, even within the bounds of limited data. The
true value of inferential statistics consists in the development and elaboration
of such earthy statistical methods.
To achieve this goal, inferential statistics introduces additional assumptions
beyond the basic setup of probability models. Our approach to inductive rea-
soning up until now relied on a probability model and IID random variables,
but made no assumptions about the nature or type of the distribution that
these variables have. Or, in Humean parlance, so far we only required the
presence of the uniformity of nature, without caring about what nature is like.
In contrast, the more realistic inference procedure we are now after makes
some further assumptions about the form of the distribution, thereby narrowing
down the range of possible distributions to be considered. So-called parametric
statistics, for instance, deals only with those distributions that can be explicitly
expressed in terms of a particular function determined by a fnite number of
parameters.12
It therefore requires not only the existence of uniformity in the form of a
probability model, but also that the uniformity is of a certain pre-specifed kind
(that is, it has a specifc functional form). A set of candidate distributions cir-
cumscribed in this way is called a statistical model. It is important not to confuse
statistical models with probability models, as the distinction will be essential in
understanding inferential statistics. Probability models, we may recall, were
The Paradigm of Modern Statistics 29

introduced in order to describe the reality that supposedly exists behind data,
using the language of probability theory such as sample spaces, σ-algebras,
probability functions, random variables, and probability distributions. A statistical
model, on the other hand, is just a hypothesis about the nature of the probability
distribution that we posit, and may well be regarded as fctional. The true
probability distribution may be highly complex, hardly specifable by a simple
function with only a few or even a fnite number of parameters. Nonetheless,
we pretend for our immediate purposes that it has a particular functional form,
and we proceed with our inferential business under this hypothesis.
At this point an attentive reader may raise their eyebrows and ask: “But
wasn’t a probability model or uniformity of nature also a hypothesis, not ame-
nable to empirical confrmation? If that’s the case, then probability models and
statistical models are both hypotheses after all, so there doesn’t seem to be any
substantive distinction between them.” It is true that, if we had to say whether
they are realities or assumptions, then they are both assumptions. They difer,
however, as to what kind of thing they are assumed to be. A probability model
or uniformity of nature is assumed to be true: as long as we are to make any
inductive inferences, we cannot but believe in the existence of some unifor-
mity—that was Hume’s point. This is not so, however, with statistical models.
In fact, most statisticians do not believe that any statistical model faithfully
describes the real world; rather, they expect it to be only an approximation—
that is, a model—good enough for the purpose of solving their inductive tasks.
In other words, a statistical model is not assumed to be a true representation
of the world, but an instrument for approximating it. This instrumental nature
of statistical models is aptly captured by the famous aphorism of the statistician
George Box: “all models are wrong, some are useful.” In contrast, the very
existence of a probability model cannot be a fction; otherwise, we would be
trapped in Humean skepticism and lose all grounds for inductive reasoning.13

Parametric Statistics and Families of Distributions


There are two ways to build statistical models. So-called nonparametric statistics
makes fairly general and weak assumptions such as the continuity or diferentia-
bility of the target distribution, without imposing any specifc functional form
on it. Parametric statistics, on the other hand, goes a step further and specifes
the general form of the distribution in terms of a particular function. Such a
distributional form is called a family of distributions. Once a family is fxed, indi-
vidual distributions are uniquely determined by specifying its parameters. This
means that parametric statistics models the target only within the range of a fxed
family of distributions. It therefore has a higher risk of distorting reality, but
allows for more precise and powerful inferences if an appropriate distributional
family is chosen. The question of which family to use depends on the nature
of the inductive problem and random variables we are interested in. Since this
Another random document with
no related content on Scribd:
“I prove it?” John Hale’s heavy brows met in a scowl. “That’s the
detective’s job, not mine.”
“I used the pronoun to imply the prosecution, and not in its
personal application,” Latimer explained. “Where was Richards on
Tuesday night?”
“Playing billiards at the club.”
“Have you proof of the exact time he left there?”
“No, but I’ll get it,” and John Hale’s tone implied grim
determination.
“Then suppose you make inquiries at the club,” suggested
Latimer; “but be guarded, John. Every one’s attention is focused on
Austin’s murder and you might start an ugly scandal.”
John Hale reddened. “Well, what if I do?” he grumbled. “The
situation couldn’t be much worse than it is to-day,”—shooting a
defiant look at his friend. “Austin murdered under mysterious
circumstances, and the police haunting our house, not to mention
the morbid sight-seers who gather about it. I cannot stir out of the
place without encountering curious glances. Even at the club there’s
excitement whenever I appear—and the newspaper men!” He struck
the desk a resounding blow with his clenched fist. “Damn it! If
Richards murdered Austin he’ll swing for it—I don’t care if he’s
married Judith a dozen times over.”
“Easy, easy,” cautioned Latimer. “Cool down, John, and let us
discuss this matter rationally. What have we discovered against
Richards?”
“That he was playing the market, that he was in need of funds,
and that he had in his possession bonds belonging to Judith which
had been stolen on Tuesday night from my brother’s safe, near
which we found Austin’s body in the small hours of Wednesday
morning.” John Hale moderated his excited manner. “Pretty damning
evidence.”
“As far as it goes,” agreed Latimer. “Now, to make it conclusive
you must prove: first, that Richards was at your house between
Tuesday midnight and one a. m. Wednesday; and secondly, that he
knew the combination of your brother’s safe. Recollect, it was not
forced open.”
“I’ll make it my business to find out.” John Hale reached for his
hat and his gloves which he had tossed on the desk. “I am also
going to have inquiries made regarding Richards’ career.”
“An excellent idea,” exclaimed Latimer. “But you had better
employ a private detective agency, John, rather than the local police.
Try the Burroughs Company, they handled some work for our firm
when Johnston, the bank cashier, hypothecated stock belonging to
us.”
“Where’s their office?” asked John Hale, jotting down the name
on the back of an envelope.
“In the Fendall Building, corner of John Marshall Place.”
John Hale completed the address and replaced the envelope in
his breast pocket.
“Listen, Frank,” he began. “Austin’s murder was unpremeditated
—the weapon used proves that. No man would deliberately kill
another with a pair of shears.”
Latimer shook his head in doubt. “You are taking a great deal for
granted,” he protested.
“Not a bit of it,” vigorously. “Austin caught Richards going through
the safe and Richards grabbed the first thing handy—Judith’s
shears.” Latimer said nothing, and after a brief pause John Hale
continued. “The crime was committed by some one familiar with the
habits of our household—the police claim that. No better time could
have been selected for rifling Robert’s safe. He was ill in bed, and
Agatha and I were attending the French Embassy reception and, by
the way, we decided to go only at the last moment—that’s an
important point.”
“You mean——”
“Richards was present when I told Agatha that I would take her to
the reception, and he left the house immediately afterward.” John
Hale was becoming excited again. “Thus, Richards knew that the
coast would be clear.”
“Hold on, he was aware that Judith was at home, and the
servants, also,” objected Latimer.
“Sure, and he knew that our servants retire early. Anna sees to
the closing of the house, and she is very strict with the other
servants.” John Hale rose abruptly and emphasized his words by
striking his cane against the floor. “And Richards knew that Judith
would not be likely to hear him, and if she did—”
“Well, what then?” as John Hale paused.
“He probably had a plausible excuse handy. Oh, he could have
manufactured some story which Judith would have swallowed,”
retorted John Hale. “Remember, they haven’t been married long.”
Latimer frowned. “Who is going to tell Judith about the theft of her
bonds?” he asked, rising also.
“It’s up to you.” John Hale moved uneasily and glanced away
from his companion. “Judith came to you about her bonds.”
“Dash it all, John!” Latimer spoke with temper. “I’m damned if I
will. Don’t you realize that Judith worships her husband?”
“Well, it’s not the first time a woman has been deceived in a
man,” replied Hale cynically. “What did she marry for in such an all-
fired hurry? I am sorry for Judith, but she must ‘dree her weird.’”
Whatever reply Latimer intended making was interrupted by the
entrance of a clerk.
“This special delivery letter has just come for you, sir,” he
explained handing it to Latimer. Then, with a polite bow to John Hale,
of which the latter took not the slightest notice, the clerk departed.
Latimer tore open the envelope and ran his eyes down the written
page to the signature. An exclamation escaped him.
“It is from Judith,” he said. “Listen:”
Dear Frank:
I gave my Valve bonds to Joe to use as he saw fit,
and he tells me that he took the shares to you and you
were kind enough to arrange the business for him, so I
shall not need the $1,000 after all.
Please don’t tell the family that I’ve become a bit of
a gambler; Joe doesn’t quite approve of a woman
speculating, but—he’s dear about it.
Thanks for all your kindness.
Faithfully,
Judith Richards.
Latimer and John Hale stared at each other.
“Let me see that letter,” the latter demanded, and he read it twice
before handing it back to Latimer. “What do you make of it?”
Latimer laughed heartily. “Thank God I shan’t have to break any
unpleasant news to her,” he exclaimed. “But the inconsistency of
women! To come to me for advice and then get her husband to do
exactly what I advised her not to.”
“What was your advice?”
“To use the bonds as collateral at a bank and not sell them.”
John Hale studied him in thoughtful silence for a minute.
“When did Richards bring the bonds here, Frank?” he asked.
“Was it some time after Judith left?”
“No; come to think of it, he must have been in the outer office
when Judith was talking to me,” responded Latimer, and his face
grew grave once again.
“And Judith states”—John Hale picked up his niece’s letter—“‘I
gave my Valve bonds to Joe to use as he saw fit and he tells me that
he took the bonds to you—’ Did Judith mention to you where she
had the bonds?”
“Now that you speak of it, she did say that they were in her
father’s safe.” Latimer eyed John Hale sharply. “What are you driving
at?”
“Simply this, that if Richards was in your front office with the
bonds in his possession, they could not have been where Judith
thought them—in her father’s safe. Secondly,”—and John Hale’s
voice deepened—“there was no time for Judith to return home, get
the bonds and give them to Richards before he sold them to your
clerk here in your outer office. Isn’t that right?”
“Yes.” Latimer’s worried look returned. “By Jove, you think—?”
“That Judith has discovered that her bonds are missing.”
“Do you suppose your brother told her?”
“I hardly think so, for he swore me to secrecy,” replied John Hale.
“No, Judith must have gone to get the bonds and found them
missing from the safe.”
“But, good Lord! How did she know that her husband had brought
the bonds to me?” demanded Latimer.
“Ask me something easy.” Hale swung his cane around and
stepped briskly to the door. “But depend on it, Frank, I’ll find an
answer to that question before I’m many hours older.” And he
banged out of the door.
Latimer strode thoughtfully up and down his office, then reseated
himself at his desk.
“What’s come over John?” he muttered. “He seemed anxious,”—
he paused—“no, more than anxious,—determined,—to fix the guilt
on Joe Richards.”
He leaned forward and eyed Judith’s letter, reading it slowly,
conning over the words, and when he straightened up there was a
gleam of frank admiration in his eyes.
“You are a loyal woman, Judith,” he exclaimed, unconscious that
he spoke aloud. “As well as ‘a bit of a gambler.’”
CHAPTER IX
HALF A SHEET

Polly Davis closed the vestibule door of her home in C Street with
a veritable slam and proceeded up the street oblivious of
greetings from several of her neighbors. The street, celebrated in its
day for having among the occupants of its stately old-fashioned brick
houses such personages as John C. Fremont, John C. Calhoun, and
General Winfield Scott, was chiefly given over to modern business
enterprises, and only a few “Cave-dwellers” (the name bestowed
upon Washingtonians by an earnest “climber” to its exclusive
resident circles) still occupied the homes of their ancestors.
Polly slackened her swift walk into a saunter as she turned the
corner from C Street into John Marshall Place. On reaching D Street
she accelerated her speed somewhat on catching sight of an
approaching street car, but it did not stop to take on passengers, and
Polly walked back to the curb with an uncomplimentary opinion of
the service of one of Washington’s public utilities. She waited in
indecision on the corner, then opening her hand bag, took from it a
scrap of paper and consulted the name written thereon. After
studying the paper for a minute, she turned and eyed the large, red
brick and stone trimmed office building standing on the southeast
corner facing the District Court House. She had seen the Fendall
Building innumerable times since her childhood days, but never
before had it held her interest.
There was a certain set air to Polly’s shoulders, which, to one
acquainted with her characteristics, indicated obstinacy, as she
crossed the street and entered the Fendall Building. She paused in
the lobby in front of the floor directory and then continued to the
second story. At the far end of the corridor she stopped before a
closed door bearing on its ground glass the title, in gold lettering:
Burroughs Detective Agency
Alfred Burroughs, Prop.
Polly returned to her hand bag the scrap of paper which she still
held tightly between the fingers of her left hand, took out a visiting
card, and stepped inside the office. There was no one in the room,
and, with a surprised glance about her, Polly crossed to a door
evidently leading to an inner office. The door was only partly closed,
and through the opening a familiar voice floated out to her:
“I depend upon your discretion, Mr. Burroughs. Remember, my
name must not be mentioned in connection with your employment in
the case—” The grating sound of chairs being pushed back followed,
and any answer was drowned thereby.
The hand which Polly had extended to knock against the panel of
the door fell nerveless to her side. With eyes distended to twice their
normal size, she retraced her footsteps out of the office and the
building.
When Polly reached the Hale residence she was admitted by the
parlor maid instead of the ever smiling Anna.
“Mr. Hale left word, Miss Polly, that you were to go to Mrs. Hale,”
Maud announced, helping Polly off with her coat and hat.
“Oh,” Polly paused. “Where is Mrs. Hale?”
“I don’t rightly know, miss.” Maud emerged from the depths of the
hall closet where she had hung Polly’s wraps. “Mrs. Hale came in not
three minutes ago. I think she has gone to her bedroom. Will you
have some lunch now, miss, or a little later?”
“A little later, thanks”—Polly regarded the hall clock. “I had no
idea it was nearly noon. You will find me with Mrs. Hale, Maud.”
“Very good, miss,” and they separated, the maid going to her
pantry, and Polly in search of Mrs. Hale. She found that energetic
matron just crossing the hall toward Judith’s boudoir. At the sound of
Polly’s hail she faced around.
“Is it you, Polly!” Mrs. Hale frequently asked the obvious. “My
dear, aren’t you very late to-day?”
Polly blushed at the emphasis on the adjective. “A little later than
ordinary,” she answered good-naturedly. “I will make up the time,
Mrs. Hale, and your husband’s manuscript will be completed without
delay. Maud said that your husband left word that I was to report to
you.”
“Did he?” Mrs. Hale regarded her in some perplexity. “Why, last
night he decided that you were not strong enough to aid me in
answering my letters; he must have changed his mind, for he
wouldn’t have sent you to me for anything else.”
Polly’s attention had been caught by one phrase and the rest of
Mrs. Hale’s speech went unheeded.
“Your husband said I was not strong?” she questioned. “I am
quite well. What made him think otherwise?”
“Judith put the idea in his head.” Mrs. Hale led the way into the
boudoir as she spoke and selected a chair near her daughter’s desk,
on which were piled the notes of condolence, in anticipation of
Richards’ answering them under Judith’s supervision. “Judith is very
much worried about your health, my dear.”
“That is very kind of Judith.” Polly slipped into the seat before
Judith’s desk at a sign from Mrs. Hale. “But your daughter is
mistaken. I am not in the least ill.”
“I am delighted to hear it.” Mrs. Hale looked at her husband’s
pretty secretary with approval. “Judith is always so positive in her
statements. I could not see that you looked run down, but she
insisted that you needed a change, and arranged with Mr. Hale to
give you a vacation.”
“Indeed!” The frigid exclamation escaped Polly unwittingly, but
Mrs. Hale apparently was oblivious of the girl’s chilly reception of
Judith’s plans.
“I am glad you don’t require a vacation,” she went on. “Mr. Hale is
particularly in need of your services, and it would be most unkind to
leave him in the lurch.”
“I have no intention of doing so, Mrs. Hale,” declared Polly with
some warmth. “Aside from the question of my not being able to
afford a vacation, gratitude to Mr. Hale, alone, would prevent me
from going away just now.” She passed one restless hand over the
other. “What possessed Judith to wish to get rid of me?”
“Now, my dear,”—Mrs. Hale held up a protesting hand—“don’t get
such a notion in your head. Judith is devoted to you; we all are, but
she imagined—you know Judith greatly depends upon her
imagination—she is so, so,”—hunting about for a word—“so shut in
with her deafness, and she is forever imagining things about people.”
“And what does she imagine about me?” asked Polly, as Mrs.
Hale came to a somewhat incoherent pause.
“That you were on the point of nervous prostration—”
Polly laughed a bit unsteadily. “Only the wealthy can afford
nervous ‘prosperity,’ and I am not in that class,” she said. “I must
work—work!” She spoke with nervous vehemence; Mrs. Hale’s
surprised expression checked her; and with an effort she regained
her self-control. “What can I do for you?”
“Answer these notes,” and Mrs. Hale laid her hand on them.
“Take this black-edged note paper,” holding out a box she had
brought with her.
Mrs. Hale’s powers of observation were wool-gathering as she
dictated her answers, first reading each letter in a monotone—in
itself enough to try the steadiest nerves—before composing its
answer; then losing her place and having to be prompted, which
added to her already confused state of mind. Every expression of
sympathy in the notes brought tears in its train, and if the steady
application of Mrs. Hale’s handkerchief proved an additional barrier
to the speedy completion of her task, it also prevented her perceiving
the wavering writing of Polly’s swiftly moving pen.
“Austin was very much beloved,” she remarked. “I cannot
understand, as I told my husband over and over, I cannot understand
who would have a motive for killing him. It is beyond me.”
“Yes,” murmured Polly. She laid down her pen and rubbed her
stiff fingers. There still remained numerous notes to answer. “Dear
Mrs. Hale, let me finish answering these later on. You must be
exhausted.”
“No, they must be completed now,” Mrs. Hale spoke with
firmness, and Polly, hiding her unsteady fingers under pretense of
searching for another pen among Judith’s papers, resigned herself to
the situation. “Judith suggested that I order an engraved card of
acknowledgment, but I desire an individual letter sent to each of our
friends. It will not take much more of your time,” observing Polly’s
eyes stray to her wrist-watch.
“Will you let me complete the letters this afternoon?” Polly asked.
“I have not touched my regular work for your husband, and it is
nearly your luncheon hour.”
“Luncheon will be half an hour later to-day,” responded Mrs. Hale.
“Anna is laid up and Maud asked for more time. She is not very quick
at her work, you know.”
“Anna ill! That is too bad,” exclaimed Polly. “I hope it is nothing
serious.”
“A sprained ankle.” Mrs. Hale leaned back in her chair and
relaxed; she felt the need of a little gossip, for in spite of her
insistence on completing her letters, the steady application was
commencing to wear upon her. “When anything goes wrong with
Anna the whole house is upset.”
“She is certainly a domestic treasure,” agreed Polly. “How many
years has she been with you?”
Mrs. Hale considered before answering. “She came to us at the
time Austin had typhoid fever; the trained nurse wanted a helper—
what did she call Anna?”
“Nurse’s aide?” suggested Polly.
“That was it,” and Mrs. Hale smiled. “We persuaded her to stay
on as waitress.”
“How did you manage it, Mrs. Hale?” asked Polly. Another glance
at her watch showed her that the announcement of luncheon must
shortly occur, and she wished above all not to resume answering
letters of condolence. “It has always struck me that Anna was very
much above the regular servant class.”
“So she is, my dear,” Mrs. Hale was launched on her favorite
topic. “But Mr. Hale offered her such high wages, really ridiculous
wages at the time, that it wouldn’t have been in human nature to
resist his offer. I must say for Anna that she has earned every cent
we pay her. Lately”—Mrs. Hale hesitated and surveyed the boudoir
to make sure that the hall door was closed—“lately, Anna has
appeared so—so absent-minded. Do you suppose it can be a love
affair?”
“The most natural supposition in the world,” smiled Polly. “Anna is
a remarkably pretty girl.”
“So she is,” Mrs. Hale nodded her head in agreement. “I suspect
it is that new clerk in the drug store. I meet them quite often walking
together, and I called Austin’s attention to them when he was last in
Washington, just six weeks ago to-day.” Mrs. Hale looked at the
calendar hanging near Judith’s desk to be sure of her facts. “Polly, if I
tell you something will you promise to hold your tongue about it?”
Polly stared at Mrs. Hale—the latter’s tone had completely
changed and her customary irresponsible manner had become one
of suppressed anxiety.
“Certainly, Mrs. Hale,” she replied, and her manner reflected the
other’s seriousness. “I will consider whatever you say as
confidential.”
“First, answer this, on your word of honor,”—and Polly’s
wonderment grew as Mrs. Hale hitched her chair nearer, and her
voice gained in seriousness. “Have you come across a small piece
of yellow paper; it is folded and has the word ‘Copy’ as a
watermark?” Seeing Polly’s uncomprehending stare, she added
impatiently, “The kind reporters use in newspaper offices. Have you
seen such a paper among my husband’s correspondence?”
“No, Mrs. Hale; not as you describe it,” Polly shook a puzzled
head. “I may not have noticed the word ‘Copy,’ though. Was there
anything else to identify it?”
Mrs. Hale thought a minute, then came to a decision. “It is no
matter,” she said brusquely. “Forget I mentioned it; there is a more
pressing matter”—from her silver mesh purse she drew out a much
creased letter. “Read that,” she directed, and held it almost under
Polly’s nose, “but not aloud, read it to yourself.”
Obediently Polly took the paper and, holding it at the proper
focus, read:
Dear Aunt Agatha:
I started for San Francisco on the midnight train, so
forgive this hasty scrawl in answer to your long letter. I
will see the happy bride and groom on my return. Sorry
Uncle Robert doesn’t like Richards. I found on inquiry
that Richards——
Polly turned the letter over—the second sheet was missing. The
young girl looked in bewilderment at Mrs. Hale.
“Have you the end of the letter?” she asked.
“No, that is all there is to it.”
“This”—Polly turned it over again. “Why, it is not even signed.”
“But it is in Austin Hale’s handwriting,” asserted Mrs. Hale. “You
know it is, Polly.”
Polly again inspected the clear, distinctive writing. She had seen
it too often to be mistaken in identifying the chirography.
“It looks like Austin’s writing,” she qualified. “When did you
receive the letter and what does it mean?”
“Mean? We’ll come to that later,” Mrs. Hale lowered her voice to a
confidential pitch. “You see the date there,” indicating it, and Polly
nodded. “The letter was begun on Tuesday in New York, and Austin
was murdered between Tuesday midnight and one a. m. Wednesday
here in Washington.”
“He was——”
“Of course he was.” Patience was never Mrs. Hale’s strong point.
“Now, Polly, let us dissect this letter. On Tuesday in New York Austin
states that he is to take the midnight train for San Francisco; instead
of that he comes to Washington. Why?” And having propounded the
conundrum, Mrs. Hale sat back and contemplated Polly. There was a
distinct pause before the girl replied.
“I cannot answer your question, Mrs. Hale.” Polly avoided raising
her eyes as she turned the letter over once again and looked at the
blank side. It was a small-sized sheet of note paper of good quality,
and Austin’s large writing completely filled the first page. Polly held
the letter nearer Mrs. Hale.
“The back sheet has been torn off,” she pointed out. “See, the
edges are rough and uneven.”
“So I observed.” Mrs. Hale was a trifle nonplussed. She had
anticipated more excitement on Polly’s part, and the girl’s composure
was a surprise. That Polly was maintaining her composure through
sheer will power, Mrs. Hale was too obtuse to detect. She was
convinced, however, that Polly had been more than ordinarily
attracted by Austin Hale’s good looks and his marked attention to her
charming self. It was not in human nature, Mrs. Hale argued, that a
young and penniless girl would refuse a wealthy young man,
especially not in favor of a man of John Hale’s age. It was absurd of
Joe Richards to insinuate that her brother-in-law might have
supplanted Austin in Polly’s affections. Having once gotten an idea in
her head no power on earth could dislodge it, and Mrs. Hale, to
prove her viewpoint, had decided to investigate the mystery of
Austin’s death to her own satisfaction. Mrs. Hale thought over Polly’s
conduct for several minutes, then changed her tactics.
“Had you heard recently from Austin?” she asked, and at the
direct question Polly changed color.
“Not since this letter to you,” she replied calmly and Mrs. Hale,
intent on framing her next question, failed to analyze her answer.
“Did he make any reference to coming to Washington?”
“Only in a general way,” and before Mrs. Hale could question her
further, she added, “His letter of ten days ago said that he might be
here in April.”
“Ah!” Mrs. Hale felt that she had scored a point. “That goes to
prove that Austin’s trip here Tuesday was unexpected.”
“So unexpected that he never even wired you,” supplemented
Polly, and Mrs. Hale eyed her sharply.
“True,” she replied. “It must have been something frightfully
urgent that brought him here—to his death.”
Polly shivered slightly and laid down the letter.
“When did Austin mail this letter to you?”
“I don’t know.”
Polly glanced at her in surprise. “Was there no postmark on the
envelope?”
“There was no envelope.”
“What!” Polly half rose then dropped back in her seat. “No
envelope? Then how did you get the letter?”
Mrs. Hale looked carefully around to make sure that no one had
entered the boudoir or was within earshot. Her next remark ignored
Polly’s question.
“I have not shown Austin’s letter to my husband,” she began. “Mr.
Hale does not always view matters from my standpoint, and he might
be displeased at my having mentioned to Austin that he was
disappointed in Judith’s choice of a husband. Therefore, Polly, you
will say nothing to him.”
“Certainly not,” agreed Polly. “But about the letter—”
“Nor mention the letter to Judith,” pursued Mrs. Hale, paying no
attention to Polly’s attempt to question her. “I shall not discuss it with
Judith, for she might readily resent my writing Austin to find out
something about her husband’s career before he entered the army in
1917. This letter”—Mrs. Hale picked it up, refolded it, and replaced it
in her purse—“must remain a secret between you and me.”
“But, Mrs. Hale,”—Polly stopped her as she was about to rise
—“where did you get the letter and who tore off the last sheet?”
“It is for us to find out who tore it off and what became of it,”
declared Mrs. Hale. At last Polly was roused out of herself, and the
older woman observed with interest the two hectic spots of color in
her cheeks. “And why the sheet was torn off.”
The opening of the boudoir door caused Polly to start nervously,
a start which, in Mrs. Hale’s case became a jump, as Richards
addressed them from the doorway.
“Maud is looking for you, Mrs. Hale,” he announced. “Luncheon is
waiting for you.”
“Thanks, yes; we will come at once.” Mrs. Hale was conscious of
her flurried manner and her ingratiating smile was a trifle strained as
she faced her handsome son-in-law. “Where is Judith?”
“She telephoned that she was lunching at the Army and Navy
Club.” Richards gave no sign that he was aware of Mrs. Hale’s
agitation. “Your husband is waiting for you.”
“Run down, Joe, and tell him not to wait for me.” Mrs. Hale laid
her hand on Polly’s shoulder and gave her a slight push. “Go also,
my dear.”
But Polly hung back. “Wait, Mrs. Hale,” she whispered feverishly.
“There, Major Richards is downstairs by now. Tell me quickly who
gave you Austin’s letter?”
“No one.”
“Then where did you get it?”
Mrs. Hale paused and looked carefully around—they had the
boudoir to themselves, but before she spoke Mrs. Hale took the
precaution to close the boudoir door.
“I found the letter this morning,” she stated, “in the leather pocket
of Judith’s electric car.”
CHAPTER X
BELOW STAIRS

Anna, the waitress, found the time lagging in spite of the game of
solitaire she was playing to wile away the tedium of her enforced
idleness. She cast a resentful glance at her swollen ankle before
shuffling the cards for the thirtieth time since she had eaten her
midday meal. She had discarded the morning newspaper, and
refused to find entertainment in the cheap paper novel which the
cook had brought to her early in the morning, so her last and only
solace was the pack of playing cards.
Mrs. Hale, a New Yorker by birth, until her marriage had spent
her life in the North, and while she had quickly succumbed to the
spell which the Capital City casts over those who come to its
hospitable doors, she had never taken kindly to employing negro
servants. She did not understand the African character, and her one
attempt to adjust herself to the conditions then prevailing in domestic
service in the District of Columbia had proved a dismal failure. With
her husband’s hasty approval she had sent to New York and
engaged French and English servants.
Aside from her eccentricities, Mrs. Hale was a kind and thoughtful
mistress, and the servants remained long in her employ. Even during
the chaotic war-time conditions in Washington, with the influx of war-
workers and deserters from the domestic field, her servants had
loyally remained with her in preference to seeking Government
“positions” as elevator women and messengers.
It required a person in Anna’s state of mind to find fault with the
large, cozily furnished bedroom in which she sat. A coal fire on the
hearth added its cheerful glow, and at her elbow was an electric
reading lamp ready for instant service when the winter afternoon
drew to a close.
Anna scowled at her reflection in the mirrored paneling of the
door leading to the bathroom which she and “cook,” a Swede,
shared with Maud, the parlor maid. For nearly twenty-four hours she
had been kept captive inside the four walls of her bedroom, and her
restless spirit rebelled. Fate, in the guise of a treacherous high-
heeled slipper, had given her an ugly tumble down the kitchen stairs
on her way to bed the night before, and Dr. McLane’s assurance that
she had had a lucky escape did not assuage Anna’s sense of
personal grievance nor deaden the pain of her physical injury.
Footsteps and the clatter of dishes, as a tray was brought in
slight contact with the stair turning, came distinctly through the open
door leading to the hall. Anna’s downcast look vanished. Seizing the
cards, she was intent on laying out her favorite solitaire when Maud
entered, bearing a tray loaded with appetizing dishes.
“I’m a bit late,” she explained apologetically, as Anna swept the
playing cards into her lap to make a place on the table for the tray.
“But there’s been a pile of coming and going in and out of the house,
and it keeps a body moving.”
“Sit down and have a cup of tea with me,” suggested Anna, on
whom the extra cup and saucer on the tray had not been lost. Maud
had evidently anticipated the invitation, judging also from the amount
of cinnamon toast and thin slices of bread and butter. “I am sorry,
Maud, to have more work thrown on you just now; perhaps I can
hobble downstairs to-morrow. Dr. McLane seemed to think I might.”
“Now, you rest easy,” advised Maud earnestly. “I can handle the
work all right, and Mr. Hale said he would come down handsome for
it.”
“He did!” Anna’s eyes had narrowed to thin slits, but Maud, intent
on consuming as much tea and toast as was humanly possible in a
given time, was oblivious of her facial contortions. “Mr. Hale is a
generous gentleman; you stick by him, Maud.”
“You bet. What he says goes,” Maud nodded enthusiastically.
“Funny household, ain’t it? A dead easy one if you are in the ‘know,’”
and she chuckled. “Let me pour you out another cup, Miss Anna,”
and, not waiting for permission, she replenished Anna’s tea, at the
same time refilling her own cup. “My, don’t cook make good toast!
No wonder Major Richards is so partial to it.”
“Is he?” Anna’s tone was dry.
“Yes, ma’am, and he’s partial to a good deal more besides.”
Maud relished an opportunity of airing her views to so superior a
person as Anna, for it was not often that she had her undivided
attention. “Major Richards knows a good-looking woman when he
sees one.”
“Is that so?” indifferently, helping herself to more sugar.
“Yes, ma’am,” with emphasis. “Didn’t I see the look and smile he
gave you yesterday?”
“Tut, tut! None of that.” Anna spoke with severity. “Major Richards
is Miss Judith’s husband, a nicely spoken gentleman.”
“Sure he is.” Maud smiled broadly, nothing daunted by Anna’s
frown. “And say, ain’t Miss Judith mashed on him? That cold kind
always flops the worst when they fall in love.”
“Miss Judith isn’t the cold kind,” retorted Anna warmly. “She has
plenty of temper about her, but I will say it’s tempered with proper
pride.”
“I wonder if it was proper pride which made her quarrel so with
Mr. Austin?” Maud’s snicker always grated on Anna, and again the
waitress frowned. “Say, wasn’t his death awful?”
“Yes.” Anna sat back with a shiver. “Terrible!”
“And they dunno who done it,” pursued Maud with relish, her
somewhat nasal voice slightly raised. “Leastways that is what
Detective Ferguson told me this afternoon.”
“Was he at the house again?”
“Yes, three times.” Maud looked regretfully at the empty toast
dish. “I asked him if he wanted a bed made up for his convenience,
and he was real peevish. My, but he asks a lot of questions!”
“What about?” inquired Anna.
“Oh, where we were on Tuesday night, and if we heard anything
unusual,” answered Maud with careless candor. “Didn’t seem to
believe that we had all gone to bed the same as usual. I told him if
we’d a known Mr. Austin was to have been murdered, o’ course we’d
have waited up for it, so as to supply the police with details. That
settled him for a time and then he wanted to know when I last saw
Miss Judith Tuesday night.”
“So?” Anna leaned out of her chair and took up a box of candy
from the bureau. “Help yourself, Maud. What did you say to
Ferguson?”
Maud received the candy with eyes which sparkled as Anna put
the box conveniently in front of her. Her craving for sweets had
frequently earned her a reprimand from Mrs. Hale when that dame
caught her in the act of purloining candy from the stock kept in the
dining room.
“I told Ferguson that Miss Judith was undressing in her bedroom
when I went upstairs.” Maud’s speech was somewhat impeded by a
large caramel. “Then he wanted to know when we first heard o’ the
murder—silly question, wasn’t it?”
“Very,” agreed Anna. “Considering he came upstairs and joined
us just after Mrs. Hale had broken the news of Mr. Austin’s death.
Men are silly creatures.”
“Some of ’em are,” amended Maud. “I never would call Mr.
Robert Hale silly. Say, Miss Anna,”—and Maud hitched her chair
close to the waitress—“do you s’pose he knows anything about the
courting that went on between Miss Polly and his brother?”
“There isn’t anything that escapes Mr. Hale’s notice,” Anna
responded dryly.
“But Miss Polly was mighty sly about it,” argued Maud. “Mr.
Austin caught her once, though, and my, didn’t he flare up!” Her eyes
grew bigger at the recollection. “I wonder if he was smart enough to
know Miss Polly, for all her appearing frankness, was playing father
and son off against each other.”
“Men never know anything where a pretty woman’s concerned,”
replied Anna scornfully. “Miss Judith knew what was going on
though, and”—she lowered her voice to confidential tones—“it’s my
belief that her Uncle John used his influence with the family to get
her sent on that visit to Japan.”
“And there she met Major Richards.” Maud selected another
piece of candy. “My, ain’t Fate funny sometimes!” Her companion
agreed, and Maud munched the milk chocolates with silent
enjoyment. Then her active mind went off on a tangent as she
caught sight of the playing cards still reposing in a disorderly heap in
Anna’s lap. “Mr. Hale got in one of his tantrums this morning.”
“He did?” Anna put down her cup from which she had been
slowly sipping her strong black tea. “What about?”
“He said one of his playing cards was missing from the pack he
keeps in the library, and he just as much as asked me if I had stolen
it.” Maud sniffed. “If he hadn’t been so nice about my wages and my
room wasn’t so comfortable, and you and cook being so agreeable,
I’d a given notice.”
“Oh, pshaw! Mr. Hale doesn’t mean half he says,” Anna hastened
to smooth down Maud’s ruffled feelings. “He forgets the cause of his
tantrums ten minutes afterward. What’s the use of paying attention to
them? His wife never does.”
“I ain’t his wife,” objected Maud. “And he didn’t forget this
tantrum, though it was about such a measly little thing, but came
right back after lunch and asked me had I found the card in any
one’s room. He was put out when I told him no.”
“It is too bad, Maud,” exclaimed Anna, who had followed her story
with gratifying attention. “Mr. Hale shouldn’t worry you when you
have extra work with me laid up here. Why not speak to Mrs. Hale?”
“Not me!” broke in Maud hastily. “I ain’t hankering to start a family
ruction. Don’t you worry, Miss Anna, I fixed it,” Maud smiled slyly. “I
went up to Miss Judith’s boudoir with the C. & P. man to mend her
branch telephone this afternoon, and I just happened to see a pack
o’ playing cards lying on Major Richards’ dresser; their backs were

You might also like