Nothing Special   »   [go: up one dir, main page]

Ebook Econometrics and Data Science Apply Data Science Techniques To Model Complex Problems and Implement Solutions For Economic Problems 1St Edition Tshepo Chris Nokeri Online PDF All Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Econometrics and Data Science: Apply

Data Science Techniques to Model


Complex Problems and Implement
Solutions for Economic Problems 1st
Edition Tshepo Chris Nokeri
Visit to download the full and correct content document:
https://ebookmeta.com/product/econometrics-and-data-science-apply-data-science-te
chniques-to-model-complex-problems-and-implement-solutions-for-economic-proble
ms-1st-edition-tshepo-chris-nokeri/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Econometrics and Data Science: Apply Data Science


Techniques to Model Complex Problems and Implement
Solutions 1st Edition Tshepo Chris Nokeri

https://ebookmeta.com/product/econometrics-and-data-science-
apply-data-science-techniques-to-model-complex-problems-and-
implement-solutions-1st-edition-tshepo-chris-nokeri/

Data Science Revealed 1st Edition Tshepo Chris Nokeri

https://ebookmeta.com/product/data-science-revealed-1st-edition-
tshepo-chris-nokeri/

Data Science Solutions with Python: Fast and Scalable


Models Using Keras, PySpark MLlib, H2O, XGBoost, and
Scikit-Learn 1st Edition Tshepo Chris Nokeri

https://ebookmeta.com/product/data-science-solutions-with-python-
fast-and-scalable-models-using-keras-pyspark-mllib-h2o-xgboost-
and-scikit-learn-1st-edition-tshepo-chris-nokeri/

Business Analytics: Data Science for Business Problems


Walter R. Paczkowski

https://ebookmeta.com/product/business-analytics-data-science-
for-business-problems-walter-r-paczkowski/
Data Science in Theory and Practice: Techniques for Big
Data Analytics and Complex Data Sets 1st Edition Maria
C. Mariani

https://ebookmeta.com/product/data-science-in-theory-and-
practice-techniques-for-big-data-analytics-and-complex-data-
sets-1st-edition-maria-c-mariani/

Guide to Industrial Analytics Solving Data Science


Problems for Manufacturing and the Internet of Things
1st Edition Richard Hill

https://ebookmeta.com/product/guide-to-industrial-analytics-
solving-data-science-problems-for-manufacturing-and-the-internet-
of-things-1st-edition-richard-hill/

Data Science: The Hard Parts: Techniques for Excelling


at Data Science 1st Edition Daniel Vaughan

https://ebookmeta.com/product/data-science-the-hard-parts-
techniques-for-excelling-at-data-science-1st-edition-daniel-
vaughan/

Practical Data Privacy: Solving Privacy and Security


Problems in Your Data Science Workflow (Fifth Early
Release) 5th Edition Katharine Jarmul

https://ebookmeta.com/product/practical-data-privacy-solving-
privacy-and-security-problems-in-your-data-science-workflow-
fifth-early-release-5th-edition-katharine-jarmul/

R for Data Science: Import, Tidy, Transform, Visualize,


and Model Data, 2nd Edition (Second Early Release)
Hadley Wickham

https://ebookmeta.com/product/r-for-data-science-import-tidy-
transform-visualize-and-model-data-2nd-edition-second-early-
release-hadley-wickham/
Tshepo Chris Nokeri

Econometrics and Data Science


Apply Data Science Techniques to Model Complex
Problems and Implement Solutions for Economic
Problems
1st ed.
Tshepo Chris Nokeri
Pretoria, South Africa

ISBN 978-1-4842-7433-0 e-ISBN 978-1-4842-7434-7


https://doi.org/10.1007/978-1-4842-7434-7

© Tshepo Chris Nokeri 2022

Apress Standard

The use of general descriptive names, registered names, trademarks,


service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress


Media, LLC part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
I dedicate this book to my family and everyone who merrily played
influential roles in my life, i.e., Professor Chris William Callaghan and
Mrs. Renette Krommenhoek from the University of the Witwatersrand,
among others I did not mention.
Introduction
This book bridges the gap between econometrics and data science
techniques. It introduces a holistic approach to satisfactorily solving
economic problems from a machine learning perspective. It begins by
discussing the practical benefits of harnessing data science techniques
in econometrics. It then clarifies the key concepts of variance,
covariance, and correlation, and covers the most common linear
regression model, called ordinary least-squares. It explains the
techniques for testing assumptions through residual analysis, including
other evaluation metrics (i.e., mean absolute error, mean squared error,
root mean squared error, and R2). It also exhibits ways to correctly
interpret your findings. Following that, it presents an approach to
tackling series time data by implementing an alternative model to the
dominant time series analysis models (i.e., ARIMA and SARIMA), called
the additive model. That model typically adds non-linearity and smooth
parameters.
The book also introduces ways to capture non-linearity in economic
data by implementing the most prevalent binary classifier, called
logistic regression, alongside metrics for evaluating the model (i.e.,
confusion matrix, classification report, ROC curve, and precision-recall
curve). In addition, you’ll learn about a technique for identifying hidden
states in economic data by implementing the Hidden Markov modeling
technique, together with an approach for realizing mean and variance
in each state. You’ll also learn how to categorize countries grounded on
similarities by implementing the most common cluster analysis model,
called the K-Means model, which implements the Euclidean distance
metric.
The book also covers the practical application of deep learning in
econometrics by showing key artificial neural networks (i.e., restricted
Boltzmann machine, multilayer perceptron, and deep belief networks),
including ways of adding more complexity to networks by expanding
hidden layers. Then, it familiarizes you with a method of replicating
economic activities across multiple trials by implementing the Monte
Carlo simulation technique. The books concludes by presenting a
standard procedure for testing causal relationships among variables,
including the mediating effects of other variables in those relationships,
by implementing structural equation modeling (SEM).
This book uses Anaconda (an open-source distribution of Python
programming) to prepare examples. Before exploring the contents of
this book, you should understand the basics of economics, statistics,
Python programming, probability theories, and predictive analytics.
The following list highlights some Python libraries that this book
covers.
Wdata for extracting data from the World Bank database
Scikit-Learn for building and validating key machine learning
algorithms
Keras for high-level frameworks for deep learning
Semopy for performing structural equation modeling
Pandas for data structures and tools
NumPy for arrays and matrices
Matplotlib and Seaborn for recognized plots and graphs
This book targets beginners to intermediate economists, data
scientists, and machine learning engineers who want to learn how to
approach econometrics problems from a machine learning perspective
using an array of Python libraries.
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub via the book’s
product page, located at www.apress.com/978-1-4842-7433-0. For
more detailed information, please visit
http://www.apress.com/source-code.
Acknowledgments
Writing a single-authored book is demanding, but I received firm
support and active encouragement from my family and dear friends.
Many heartfelt thanks to the Apress Publishing team for all their
support throughout the writing and editing processes. Last, my humble
thanks to all of you for reading this; I earnestly hope you find it helpful.
Table of Contents
Chapter 1:​Introduction to Econometrics
Econometrics
Economic Design
Understanding Statistics
Machine Learning Modeling
Deep Learning Modeling
Structural Equation Modeling
Macroeconomic Data Sources
Context of the Book
Practical Implications
Chapter 2:​Univariate Consumption Study Applying Regression
Context of This Chapter
Theoretical Framework
Lending Interest Rate
Final Consumption Expenditure (in Current U.​S.​Dollars)
The Normality Assumption
Normality Detection
Descriptive Statistics
Covariance Analysis
Correlation Analysis
Ordinary Least-Squares Regression Model Development Using
Statsmodels
Ordinary Least-Squares Regression Model Development Using
Scikit-Learn
Cross-Validation
Predictions
Estimating Intercept and Coefficients
Residual Analysis
Other Ordinary Least-Squares Regression Model
Performance Metrics
Ordinary Least-Squares Regression Model Learning Curve
Conclusion
Chapter 3:​Multivariate Consumption Study Applying Regression
Context of This Chapter
Social Contributions (Current LCU)
Lending Interest Rate
GDP Growth (Annual Percentage)
Final Consumption Expenditure
Theoretical Framework
Descriptive Statistics
Covariance Analysis
Correlation Analysis
Correlation Severity Detection
Dimension Reduction
Ordinary Least-Squares Regression Model Development Using
Statsmodels
Residual Analysis
Ordinary Least-Squares Regression Model Development Using
Scikit-Learn
Cross-Validation
Hyperparameter Optimization
Residual Analysis
Ordinary Least-Squares Regression Model Learning Curve
Conclusion
Chapter 4:​Forecasting Growth
Descriptive Statistics
Stationarity Detection
Random White Noise Detection
Autocorrelation Detection
Different Univariate Time Series Models
The Autoregressive Integrated Moving Average
The Seasonal Autoregressive Integrated Moving Average
Model
The Additive Model
Additive Model Development
Additive Model Forecast
Seasonal Decomposition
Conclusion
Chapter 5:​Classifying Economic Data Applying Logistic Regression
Context of This Chapter
Theoretical Framework
Urban Population
GNI per Capita, Atlas Method
GDP Growth
Life Expectancy at Birth, Total (in Years)
Descriptive Statistics
Covariance Analysis
Correlation Analysis
Correlation Severity Detection
Dimension Reduction
Making a Continuous Variable a Binary
Logistic Regression Model Development Using Scikit-Learn
Logistic Regression Confusion Matrix
Logistic Regression Confusion Matrix Interpretation
Logistic Regression Classification Report
Logistic Regression ROC Curve
Logistic Regression Precision-Recall Curve
Logistic Regression Learning Curve
Conclusion
Chapter 6:​Finding Hidden Patterns in World Economy and Growth
Applying the Hidden Markov Model
Descriptive Statistics
Gaussian Mixture Model Development
Representing Hidden States Graphically
Order Hidden States
Conclusion
Chapter 7:​Clustering GNI Per Capita on a Continental Level
Context of This Chapter
Descriptive Statistics
Dimension Reduction
Cluster Number Detection
K-Means Model Development
Predictions
Cluster Centers Detection
Cluster Results Analysis
K-Means Model Evaluation
The Silhouette Methods
Conclusion
Chapter 8:​Solving Economic Problems Applying Artificial Neural
Networks
Context of This Chapter
Theoretical Framework
Restricted Boltzmann Machine Classifier
Restricted Boltzmann Machine Classifier Development
Restricted Boltzmann Machine Confusion Matrix
Restricted Boltzmann Machine Classification Report
Restricted Boltzmann Machine Classifier ROC Curve
Restricted Boltzmann Machine Classifier Precision-Recall
Curve
Restricted Boltzmann Machine Classifier Learning Curve
Multilayer Perceptron (MLP) Classifier
Multilayer Perceptron (MLP) Classifier Model Development
Multilayer Perceptron Classification Report
Multilayer Perceptron ROC Curve
Multilayer Perceptron Classifier Precision-Recall Curve
Multilayer Perceptron Classifier Learning Curve
Artificial Neural Network Prototyping Using Keras
Artificial Neural Network Structuring
Network Wrapping
Keras Classifier Confusion Matrix
Keras Classification Report
Keras Classifier ROC Curve
Keras Classifier Precision-Recall Curve
Training Loss and Cross-Validation Loss Across Epochs
Training Loss and Cross-Validation Loss Accuracy Across
Epochs
Conclusion
Chapter 9:​Inflation Simulation
Understanding Simulation
Context of This Chapter
Descriptive Statistics
Monte Carlo Simulation Model Development
Simulation Results
Simulation Distribution
Chapter 10:​Economic Causal Analysis Applying Structural
Equation Modeling
Framing Structural Relationships
Context of This Chapter
Theoretical Framework
Final Consumption Expenditure
Inflation and Consumer Prices
Life Expectancy in Sweden
GDP Per Capita Growth
Covariance Analysis
Correlation Analysis
Correlation Severity Analysis
Structural Equation Model Estimation
Structural Equation Model Development
Structural Equation Model Information
Structural Equation Model Inspection
Report Indices
Visualize Structural Relationships
Conclusion
Index
About the Author
Tshepo Chris Nokeri
harnesses advanced analytics and
artificial intelligence to foster innovation
and optimize business performance. In
his functional work, he delivered
complex solutions to companies in the
mining, petroleum, and manufacturing
industries. He earned a Bachelor’s
degree in Information Management and
then graduated with an Honour’s degree
in Business Science at the University of
the Witwatersrand on a TATA Prestigious
Scholarship and a Wits Postgraduate
Merit Award. He was also unanimously
awarded the Oxford University Press Prize. He is the author of Data
Science Revealed and Implementing Machine Learning in Finance, both
published by Apress.
About the Technical Reviewer
Pratibha Saha
is an economics graduate currently working as an Economist Analyst-
Consultant at Arthashastra Intelligence. She is trained in econometrics,
statistics, and finance with interests in machine learning, deep learning,
AI, et al.
She is motivated by the idea of problem solving with a purpose and
strongly believes in diversity facilitating tech to supplement socially
aware decision making. She finds technology to be a great enabler and
understands the poignancy of data-driven solutions. By investigating
the linkages of tech, AI, and social impact, she hopes to use her skills to
propel these solutions.
Additionally, she is a feline enthusiast and loves dabbling in origami.
Find her on LinkedIn at
https://www.linkedin.com/in/pratibha-saha-
8089a3192/ and on GitHub at https://github.com/Pratsa09
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
T. C. Nokeri, Econometrics and Data Science
https://doi.org/10.1007/978-1-4842-7434-7_1

1. Introduction to Econometrics
Tshepo Chris Nokeri1
(1) Pretoria, South Africa

This chapter explains data science techniques applied to the field of econometrics.
To begin, it covers the relationship between economics and quantitative methods,
which paves the way for the econometrics field. It also covers the relevance of
econometrics in devising and revising the economic policies of a nation. It then
summarizes machine learning, deep learning, and structural equation modeling.
To conclude, it reveals ways to extract macroeconomic data using a standard
Python library.

Econometrics
Econometrics is a social science subclass that investigates broad business
activities at the macro level, i.e., at the country, region, or continent level. It is an
established social science field that employs statistical models to investigate
theoretical claims about macroeconomic phenomena. Figure 1-1 is a
simplification of econometrics. Organizations like the statistical bureau capture
economic activities across time, which they make available to the public.
Practitioners, such as economists, research analysts, and statisticians alike,
extract the data and model it using algorithms grounded on theoretical
frameworks in order to make future predictions.
Figure 1-1 Econometrics
Before you proceed with the contents of this book, be sure that you
understand the basic concepts that relate to economics and statistics.

Economic Design
Economic design is grounded on the notion that if we can accurately estimate
macroeconomic phenomenon, we can devise mechanisms that help manage it. As
mentioned, there are several well-established organizations from which one can
extract factual macroeconomic data. Note that we cannot estimate the whole
population, but we can use a sample (a representative of the population) because
there are errors in statistical estimations. Because there is a pool of reliable
macroeconomic data sources, we can apply the data and investigate consistent
patterns by applying quantitative models to make sense of an economy. When we
are confident that a model estimates what we intend it to estimate and does so
exceptionally, we can apply such a model to predict economic events. Remember
that the primary purpose of a scientific enterprise is to predict events and control
underlying mechanisms by applying quantitative models.
Econometrics uses statistical principles to estimate the parameters of a
population, but the ultimate litmus test is always economic ideology. Only
economic theory can validate/invalidate the results, which can be further used to
determine causation/correlation, etc. It should be apparent that politics occupies
a paramount role in modern life. At most, the political sentiments typically
accompany a firm belief about the economy and how it ought to be. Such beliefs
might not reflect economic reality. When the considered belief about the economy
is absurd, there is no way of combating pressing societal problems with devised
solutions. To satisfactorily solve an economic problem, you must have a logical
view; otherwise, feelings, standard assumptions, and authoritarian knowledge
dilute your analysis of an economy.
In summary, policymakers apply econometrics to devise and revise economic
policies so that they can correctly solve economic problems. This entails that they
investigate historical economic events, develop complex quantitative models, and
apply findings of those models (provided they are reliable) to drive economic
policies. Econometrics is an approach for finding answers to questions that relate
to the economy. Policymakers who are evidence-oriented drive policymaking
initiatives by applying factual data rather than depending on political and
economic ideologies.

Understanding Statistics
Statistics is the field concerned with discovering consistent patterns in raw data
to derive a logical conclusion regarding a recognized phenomenon. It involves
investigating the central tendency (the mean value) and dispersion of data (the
standard deviation) and then studying theoretical claims about the phenomenon
by applying quantitative models. In addition, business institutions apply it in ad
hoc reporting, research, and business process controls. Researchers, in addition,
apply statistics in fields like natural sciences, physical sciences, chemistry,
engineering, and social sciences, among other fields. It is the backbone of
quantitative research.

Machine Learning Modeling


There is a link between statistics and machine learning. In this book, we consider
machine learning an extension of statistics that incorporates techniques from
fields like computer science. Machine learning methods derive from statistical
principles and methods. We approach machine learning problems with
“applications” and “automation” in mind. With machine learning, the end goal is
not to derive some conclusion but to automate monotonous tasks and determine
replicable patterns for those autonomous tasks. Figure 1-2 shows how
quantitative models operate.
Figure 1-2 Fundamental machine learning model
Figure 1-2 demonstrates the basic machine learning model flow. Initially, we
extract the data from a database, then preprocess and split it. This is followed by
modeling the data by applying a function that receives a predictor variable and
operates it to generate an output value. A variable represents a process that we
can observe and estimate. It is common practice in machine learning to deploy
models as web applications or as part of web applications.

Deep Learning Modeling


Deep learning applies artificial neural networks (a reciprocal web of nodes) that
imitate the human neural structure. Artificial neural networks are a group of
nodes that receive input values in the input layer, transform them to the
subsequent hidden layer (a layer between the input and output layer), which
transforms them and allots varying weights (vector parameters that determine
the extent of influence input values have on output values) and bias (a balance
value which is 1). It is a subclass of machine learning that combats some
difficulties that we encounter with conventional quantitative models. For
instance, the vanishing gradient problem—a case in which the gradient is
minimal at the initial phase of the training process and increases as we include
more data. There are other types of artificial neural networks, i.e. Restricted
Boltzmann Machine—a shallow network between the hidden layer and output
layer, Multilayer Perceptron—a neural network with over two hidden layers,
Recurrent Neural Network—a neural network for solving sequential data, and
Convolutional Neural Network—a neural network for dimension reduction,
frequently applied in computer vision. This covers the Restricted Boltzmann
Machine and Multilayer Perceptron. Figure 1-3 shows a Multilayer Perceptron
classifier.

Figure 1-3 Example of a Multilayer Perceptron classifier


Figure 1-3 shows that the Multilayer Perceptron classifier is composed of an
input layer that retrieves input values (X1, X2, and X3) and conveys them to the
first hidden layer. That layer then retrieves the values and transforms them by
applying a function (in our case, the Sigmoid function). It conveys an output value,
which is then conveyed to the second hidden layer, which also retrieves the input
values. The process reiterates—it transforms values and conveys them to the
output layer and produces an output value represented as Y in Figure 1-3. We
recognized the training process that networks apply to learn the structure of the
data, recognized as backward propagation (updating weights in reverse). Chapter
8 covers deep learning.

Structural Equation Modeling


The structural equation model includes a set of models that determine the nature
of causal relationships among sets of variables. It includes factor analysis, path
analysis, and regression analysis. It helps us investigate mediating relationships,
so we can detect how the presence of other variables weakens or strengthens the
nature of the structural relationship between the predictor variable and the
response variable. Figure 1-4 shows a hypothetical framework that outlines direct
and indirect structural relationships.
Figure 1-4 Fundamental structural equation model
Figure 1-4 demonstrates a hypothetical framework representing the
structural relationship between GDP per capita growth (as an annual percentage),
inflation, consumer price index (as a percentage), and final consumption
expenditure (in current U.S. dollars). In addition, it highlights the mediating
effects of life expectancy on the relationship between GDP per capita growth and
final consumption expenditure. Chapter 10 covers structural equation modeling.

Macroeconomic Data Sources


There are several libraries that are used to extract macroeconomic data. This
book uses one of the more prominent libraries, called wbdata. This library
extracts data from the World Bank database1. Alternatively, you can extract the
data from the World Bank website directly. In addition, there are other
macroeconomic sources you can use, such as the St. Louis Fed (Federal Reserve
Economic) database2 and the International Monetary Fund database3, among
others.
This book uses the world-bank-data library as the principal library
because it offers a wide range of social indicators. Before you proceed, ensure that
you install the world-bank-data library. This will make the process of
developing quantitative models much simpler, as you will not have to write
considerable chunks of code. To install the library in the Python environment, use
pip install wbdata. Provided you are using the Anaconda environment, use
conda install wbdata. At the point of printing this book, the version of the
library was v0.3.0. Listing 1-1 shows how to retrieve the macroeconomic data.

import wbdata
country = ["USA"]
indicator = {"FI.RES.TOTL.CD":"gdp_growth"}
df = wbdata.get_dataframe(indicator, country=country,
convert_date=True)
Listing 1-1 Loading Data from the World Bank Library
wbdata extracts the data and loads it into a pandas dataframe. Figure 1-5
demonstrates the wbdata workflow.

Figure 1-5 World Bank library workflow

Extracting data from the wbdata library requires that you specify the country
ID. Given that the World Bank includes several countries, it is burdensome to
know the IDs of all of them. The most convenient way to find a country’s ID is to
search for it by name (see Listing 1-2). For this example, we entered China, and
it returned Chinese countries, including their IDs.

wbdata.search_countries("China")
id name
---- --------------------
CHN China
HKG Hong Kong SAR, China
MAC Macao SAR, China
TWN Taiwan, China
Listing 1-2 Searching for a Country ID

Extracting data from the wbdata library requires that you specify the
economic indicator’s ID as well. Given that the World Bank includes several
macroeconomic indicators, it is burdensome to know the IDs of all the indicators.
The most convenient way to find an indicator’s ID is to search for it by name (see
Listing 1-3). For this example, we entered inflation and it returned all
indicators that contain the word “inflation,” including their IDs.
wbdata.search_indicators("inflation")
id name
-------------------- ------------------------------------
-------------
FP.CPI.TOTL.ZG Inflation, consumer prices (annual
%)
FP.FPI.TOTL.ZG Inflation, food prices (annual %)
FP.WPI.TOTL.ZG Inflation, wholesale prices (annual
%)
NY.GDP.DEFL.87.ZG Inflation, GDP deflator (annual %)
NY.GDP.DEFL.KD.ZG Inflation, GDP deflator (annual %)
NY.GDP.DEFL.KD.ZG.AD Inflation, GDP deflator: linked
series (annual %)
Listing 1-3 Searching for Macroeconomic Data
The wbdata library includes several data sources, like World Development
Indicators, Worldwide Governance Indicators, Subnational Malnutrition Database,
International Debt Statistics, and International Debt Statistics: DSSI, among
others. This book focuses predominantly on sources that provide economic data.
It also covers social indicators. Listing 1-4 demonstrates how to retrieve indicator
sources using wbdata.get_source() (see Table 1-1).

sources = wbdata.get_source()
sources
Listing 1-4 Retrieving the World Bank Sources

Table 1-1 World Bank Sources

ID Last Name Code Description URL Data Metadata Concepts


Updated Availability Availability
0 1 2019- Doing DBS Y Y 3
10-23 Business
1 2 2021- World WDI Y Y 3
05-25 Development
Indicators
2 3 2020- Worldwide WGI Y Y 3
09-28 Governance
Indicators
3 5 2016- Subnational SNM Y Y 3
03-21 Malnutrition
Database
4 6 2021- International IDS Y Y 4
01-21 Debt
ID Last Name Code Description URL Data Metadata Concepts
Updated Availability Availability
Statistics
... ... ... ... ... ... ... ... ... ...
60 80 2020- Gender GDL Y N 4
07-25 Disaggregated
Labor
Database
(GDLD)
61 81 2021- International DSI Y N 4
01-21 Debt
Statistics:
DSSI
62 82 2021- Global Public GPP Y N 3
03-24 Procurement
63 83 2021- Statistical SPI Y Y 3
04-01 Performance
Indicators
(SPI)
64 84 2021- Education EDP Y Y 3
05-11 Policy
Table 1-1 outlines the source ID, name, code, availability, metadata availability,
concepts, and the last date of update. Listing 1-5 shows how to retrieve the topics
(see Table 1-2). Each topic has its own ID.

wbdata.get_topic()
Listing 1-5 Retrieve Topic

Table 1-2 World Bank Topic

ID Value Source Note


0 1 Agriculture & Rural Development For the 70 percent of the world’s poor who liv...
1 2 Aid Effectiveness Aid effectiveness is the impact that aid has i...
2 3 Economy & Growth Economic growth is central to economic develop...
3 4 Education Education is one of the most powerful instrume...
4 5 Energy & Mining The world economy needs ever-increasing amount...
5 6 Environment Natural and man-made environmental resources –...
6 7 Financial Sector An economy’s financial markets are critical to...
7 8 Health Improving health is central to the Millennium ...
8 9 Infrastructure Infrastructure helps determine the success of ...
ID Value Source Note
9 10 Social Protection & Labor The supply of labor available in an economy in...
10 11 Poverty For countries with an active poverty monitorin...
11 12 Private Sector Private markets drive economic growth, tapping...
12 13 Public Sector Effective governments improve people’s standar...
13 14 Science & Technology Technological innovation, often fueled by gove...
14 15 Social Development Data here cover child labor, gender issues, re...
15 16 Urban Development Cities can be tremendously efficient. It is ea...
16 17 Gender Gender equality is a core development objectiv...
17 18 Millennium development goals
18 19 Climate Change Climate change is expected to hit developing c...
19 20 External Debt Debt statistics provide a detailed picture of ...
20 21 Trade Trade is a key means to fight poverty and achi...
Table 1-2 outlines source ID, values of the topic, and source notes. The
wbdata library encompasses a broad range of topics from fields like health,
economics, urban development, and other social science-related fields.

Context of the Book


Each chapter of this book starts by covering the underlying concepts of a
particular model. The chapters show ways to extract macroeconomic data for
exploration, including techniques to ensure that the structure of the data is
suitable for a chosen model and meets preliminary requirements. In addition, the
chapters reveal possible ways of establishing a hypothetical framework and
testable hypotheses. They discuss how to investigate hypotheses by employing a
quantitative model that operates a set of variables to generate output values.
For each model in this book, there are ways to evaluate it. Each chapter also
includes visuals that will help you better understand the structure of the data and
the results.

Practical Implications
This book expands on the present body of knowledge on econometrics. It covers
ways through which you can apply data science techniques to discover patterns in
macroeconomic data and draw meaningful insights. It intends on accelerating
evidence-based economic design—devising and revising economic policies based
on evidence that we derive from quantitative-driven models. This book is for
professionals who seek to approach some of the world’s most pressing problems
by applying data science and machine learning techniques. In summary, it will
enable you to detect why specific social and economic activities occur, and help
you predict the likelihood of future activities occurring. The book assumes that
you have some basic understanding of key concepts of statistics and economics.

Footnotes
1 Indicators | Data (worldbank.​org)

2 Federal Reserve Economic Data | FRED | St.​Louis Fed (stlouisfed.​org)

3 IMF Data
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
T. C. Nokeri, Econometrics and Data Science
https://doi.org/10.1007/978-1-4842-7434-7_2

2. Univariate Consumption Study Applying


Regression
Tshepo Chris Nokeri1
(1) Pretoria, South Africa

This chapter introduces the standard univariate (or simple) linear regression model,
called the ordinary least-squares model, which estimates the intercept and slope while
diminishing the residuals (see Equation 2-1). It applies the model to determine the
relationship between the interest rates that U.S. banks charge for lending and the
market value of goods and services that U.S. households consume annually. It includes
ways of conducting covariance analysis, correlation analysis, model development,
cross-validation, hyperparameter optimization, and model performance analysis.
The ordinary least-squares model is one of the most common parametric methods.
It establishes powerful claims regarding the data—it expects normality (values of a
variable saturating the mean value) and linearity (an association between an
independent variable and a dependent variable). This chapter uses the most common
parametric method, called the ordinary least-squares model, to investigate the
association between the predictor variable (the independent variable) and the
response variable (the dependent variable). It’s based on a straight-line equation (see
Figure 2-1).
Figure 2-1 Line of best fit
Figure 2-1 shows a straight line in red and the independent data points in green—
the line cuts through the data points. Equation 2-1 shows the ordinary least-squares
equation.
(Equation 2-1)
Where is the predicted response variable (the expected the U.S. final
consumption expenditure in this example), represents the intercept—representing

the mean value of the response variable (the U.S. final consumption expenditure in U.S.
dollars for this example), represents the predictor variable (the U.S. lending

interest rate in this example), and 1 is the slope—representing the direction of the

relationship between X (the U.S. lending interest rate) and the final consumption
expenditure (in current U.S. dollars). Look at the straight red line in Figure 2-1—the
slope is positive). Finally, represents the error in terms (refer to Equation 2-2).

(Equation 2-2)
Where εi is the error in term (also called the residual term )—representing the
difference between yi (the actual U.S. final consumption expenditure) and i (the
predicted U.S. final consumption expenditure).
There is a difference between variables with a hat/caret (which are sample
regression functions) and without one (population regression functions). We estimate
those containing a hat/caret from a sample of the population, rather than from the
entire population.

Context of This Chapter


This chapter uses the ordinary least-squares regression model to determine the linear
relationship between the predictor variable (the U.S. lending interest rate as a
percentage) and the response variable (final consumption expenditure in current U.S.
dollars). Table 2-1 outlines the macroeconomic indicators for this chapter.

Table 2-1 The U.S. Macroeconomic Indicators for This Chapter

Code Title
FR.INR.LEND Lending interest rate (as a percentage)
NY.GDP.MKTP.KD.ZG Final consumption expenditure (in current U.S. dollars)

Theoretical Framework
Figure 2-2 shows the relationship that this chapter explores. It establishes the
research hypothesis.

Figure 2-2 Theoretical framework

HYPOTHESES
H0: There is no significant difference between the U.S. lending interest rate (%) and
the final consumption expenditure (in current U.S. dollars).
HA: There is a significant difference between U.S. social contributions and the
final consumption expenditure (in current U.S. dollars).

The research hypothesis seeks to determine whether a change in the U.S. lending
interest rate influences the final consumption expenditure (in current U.S. dollars).

Lending Interest Rate


The lending interest rate estimates the rate that private banks charge as interest for
short-term and mid-term loans. We express the estimate as an annual percentage.
Figure 2-3 demonstrates the rate that U.S.-based private banks charged as interest for
short-term and mid-term financing from 1960 to 2020. Before you proceed, be sure
that you have the Matplotlib library installed in your environment. To install the
Matplotlib library in a Python environment, use pip install matplotlib.
Equally, to install the library in a Conda environment, use conda install -c
conda-forge matplotlib. See Listing 2-1.

import wbdata
from matplotlib.pyplot import
%matplotlib inline
country = ["USA"]
indicator = {"FP.CPI.TOTL.ZG":"lending_rate"}
inflation_cpi = wbdata.get_dataframe(indicator,
country=country, convert_date=True)
inflation_cpi.plot(kind="line",color="green",lw=4)
plt.title("The U.S. lending interest rate (%)")
plt.ylabel("Lending interest rate (%)")
plt.xlabel("Date")
plt.legend(loc="best")
plt.show()
Listing 2-1 The U.S. Lending Interest Rate
Figure 2-3 The U.S. lending interest rate
Figure 2-3 demonstrates that from 1960 to 1980, the rate that U.S.-based banks
charged as interest for private short-term and mid-term financing grew from 1.45% to
13.55% (the highest peak). Following that, in the early 1980s, the rate sharply
declined, then it remained stagnant. It also shows that the lending interest rate
reached its lowest point in 2008. In summary, acquiring debt from U.S.-based banks
was more expensive in the late 1970s, and relatively cheap in 2008.

Final Consumption Expenditure (in Current U.S. Dollars)


The final consumption expenditure is the market value of general goods and services
that households in an economy purchase (see Listing 2-2). Figure 2-4 demonstrates
the U.S. final consumption expenditure from 1960 to 2020.

country = ["USA"]
indicator = {"NE.CON.TOTL.CD":"final_consumption"}
final_consumption = wbdata.get_dataframe(indicator,
country=country, convert_date=True)
final_consumption.plot(kind="line",color="orange",lw=4)
plt.title("The U.S. FCE")
plt.ylabel("FCE")
plt.xlabel("Date")
plt.legend(loc="best")
plt.show()
Listing 2-2 The U.S. Final Consumption Expenditure (Current U.S. Dollars)
Figure 2-4 The U.S. final consumption expenditure
Figure 2-4 shows that there has been an uninterruptible upswing in the market
value of general goods and services that U.S. households purchased since 1970.

The Normality Assumption


A normal distribution is also called a Gaussian distribution. This book uses these
terms interchangeably. This distribution shows saturation data points around the
mean data point. Ordinary least-squares regression models assume data points
saturate the mean data point, thus you must detect normality before training the
model on the U.S. lending interest rate and the final consumption expenditure data.

Normality Detection
Normality detection involves investigating the central tendency of data points. If the
U.S. lending interest rate and the final consumption expenditure data saturates the
mean data point, you can summarize the data using the central data points. An
additional implicit assumption is that the errors follow the normality assumption,
which makes them easier to deal with. To detect normality, you must estimate the
mean data point (see Equation 2-3).

(Equation 2-3)
Where is the mean value, x1 is the first data point, x2 is the second data point,

and so forth, and n represents the total number of data points. You divide the count of
data points by degrees of freedom. Alternatively, you can find the median data point
(the central data point). To determine the dispersion, estimate the standard deviation
(see Equation 2-4).

(Equation 2-4)

After estimating the standard deviation, you square it to find the variance (see
Equation 2-5).

(Equation 2-5)

Listing 2-3 retrieves data relating to the U.S. lending interest and the final
consumption expenditure data from the World Bank. See Table 2-2.

country = ["USA"]
indicators = {"FR.INR.LEND":"lending_rate",
"NE.CON.TOTL.CD":"final_consumption"}
df = wbdata.get_dataframe(indicators, country=country,
freq="M", convert_date=False)
df.head()
Listing 2-3 Load the U.S. Lending Interest Rate and Final Consumption Expenditure Data

Table 2-2 The U.S. Lending Interest Rate and Final Consumption Expenditure Data

Date lending_rate final_consumption


2020 3.544167 NaN
2019 5.282500 1.753966e+13
2018 4.904167 1.688457e+13
2017 4.096667 1.608306e+13
2016 3.511667 1.543082e+13

Table 2-2 shows that data points are missing from the data. Listing 2-4 substitutes
the missing data points with the mean value.
df["lending_rate"] =
df["lending_rate"].fillna(df["lending_rate"].mean())
df["final_consumption"] =
df["final_consumption"].fillna(df["final_consumption"].mean())
Listing 2-4 Replacing Missing Data Points with the Mean Value

Descriptive Statistics
There are several ways to visualize and summarize the distribution of data. The
simplest way involves the use of a box plot. Box plots can also help detect normality.
This plot confirms the location of the median data point. It also informs you about the
length of the distribution tail, thus adequately supporting you in diagnosing outliers in
the data. Figure 2-5 shows a box plot of the U.S. lending interest rate created by the
code in Listing 2-5.

df["lending_rate"].plot(kind="box",color="green")
plt.title("The U.S. lending interest rate (%)")
plt.ylabel("Values")
plt.show()
Listing 2-5 The U.S. Lending Interest Rate Distribution

Figure 2-5 The U.S. lending interest rate


Figure 2-5 shows that there are three extreme values in the U.S. lending interest
rate data. Listing 2-6 substitutes any outliers with the mean data point and determines
the new distribution (see Figure 2-6). Before you proceed, be sure that you have the
NumPy library installed in your environment. To install the NumPy library in a Python
environment, use pip install numpy. Equally, to install the library in a Conda
environment, use conda install -c anaconda numpy.

import numpy as np
df['lending_rate'] = np.where((df["lending_rate"] >
14.5),df["lending_rate"].mean(),df["lending_rate"])
df["lending_rate"].plot(kind="box",color="green")
plt.title("The U.S. lending interest rate (%)")
plt.ylabel("Values")
plt.show()
plt.show()
Listing 2-6 The U.S. Lending Interest Rate Distribution

Figure 2-6 The U.S. lending interest rate distribution

Listing 2-7 returns Figure 2-7, which shows the distribution and outliers in the U.S.
final consumption expenditure data.
df["final_consumption"].plot(kind="box",color="orange")
plt.title("US FCE")
plt.ylabel("Values")
plt.show()
Listing 2-7 The U.S. Final Consumption Expenditure Distribution

Figure 2-7 The U.S. final consumption expenditure distribution


Figure 2-7 shows there are no outliers in the U.S. final consumption expenditure
data. The command in Listing 2-8 retrieves a comprehensive report relating to the
central tendency and the dispersion of data points (see Table 2-3).

df.describe()
Listing 2-8 Descriptive Summary

Table 2-3 Descriptive Summary

lending_rate final_consumption
Count 61.000000 6.100000e+01
Mean 6.639908 7.173358e+12
Std 2.436502 4.580756e+12
Min 3.250000 8.395100e+11
lending_rate final_consumption

25% 4.500000 3.403470e+12


50% 6.824167 7.173358e+12
75% 8.270833 1.006535e+13
Max 12.665833 1.753966e+13
Table 2-3 shows that:
The mean value of the U.S. lending interest rate data is 6.639908 and the final
consumption expenditure mean is 7.173358e+12.
The data points of the U.S. lending interest rate deviate from the mean by 2.436502
and the final consumption expenditure data points deviate from the mean by
4.580756e+12.

Covariance Analysis
Covariance analysis involves estimating the extent to which variables vary with respect
to each other. Equation 2-6 shows the covariance formula.

(Equation 2-6)

Where xi represents independent data points of the lending interest rate (%) and
represents the mean value of the predictor variable. yi represents the independent

data points of the U.S. final consumption expenditure, and represents the mean

value of the U.S. final consumption expenditure. Listing 2-9 estimates the joint
variability between the U.S. lending interest rate and the final consumption
expenditure (see Table 2-4).

dfcov = df.cov()
dfcov
Listing 2-9 Covariance Matrix

Table 2-4 Covariance Matrix

lending_rate final_consumption
lending_rate 5.936544e+00 -7.092189e+12
final_consumption -7.092189e+12 2.098332e+25
Table 2-4 shows that the U.S. lending interest rate variance is 5.936544e+00 and
that the final consumption expenditure varies by 2.098332e+25. It shows you that the
joint variability between the lending interest rate and the U.S. final consumption
expenditure is -7.092189e+12. The next section provides an overview of correlation
methods and explains which correlation method is used for this problem.

Correlation Analysis
Unlike covariance analysis, which shows how variables vary with respect to each other,
correlation analysis estimates the dependency among variables. There are three
principal correlation methods—the Pearson correlation method, which can estimate
dependency among continuous variables, the Kendall method, which can estimate
dependency among categorical variables, and the Spearman method, which also can
estimate an association among categorical variables. Macroeconomic data is often
continuous, so this chapter uses the Pearson correlation method. Most of the chapters
in this book use this method, except for Chapter 5, which uses the Kendall method.
Equation 2-7 estimates the covariance, then divides the estimate by the dispersion in
the predictor variable and the response variable to retrieve a Pearson correlation
coefficient.

(Equation 2-7)

Where rxy is the Pearson correlation coefficient. You can estimate that coefficient
by dividing the covariance between the U.S. lending interest rate and the U.S. final
consumption expenditure by the square root of the sum of the deviations. Listing 2-10
retrieves the Pearson correlation matrix (see Table 2-5).

dfcorr = df.corr(method="pearson")
dfcorr
Listing 2-10 Pearson Correlation Matrix

Table 2-5 Pearson Correlation Matrix

lending_rate final_consumption
lending_rate 1.000000 -0.635443
final_consumption -0.635443 1.000000

Table 2-6 interprets the Pearson correlation coefficients outlined in Table 2-5.

Table 2-6 Interpretation of Pearson Correlation Coefficients


Relationship Pearson Findings
Correlation
Coefficient
The U.S. lending interest rate (%) -0.635443 There is an extreme negative correlation
and the final consumption between the U.S. lending interest rate (%) and
expenditure (in current U.S. the final consumption expenditure (in current
dollars) U.S. dollars).
Figure 2-8 shows the statistical difference between the U.S. lending interest rate
and the final consumption expenditure. Before you proceed, be sure that you have the
seaborn library installed in your environment. To install the seaborn library in a
Python environment, use pip install seaborn. Equally, to install the library in a
Conda environment, use conda install -c anaconda seaborn. See Listing 2-
11.

import seaborn as sns


sns.jointplot(x = "lending_rate", y="final_consumption",
data=df, kind="reg",color="navy")
plt.show()
Listing 2-11 Pairwise Scatter Plot
Figure 2-8 The U.S. lending interest rate and the final U.S. consumption expenditure joint plot
Figure 2-8 confirms that there is a negative correlation between the U.S. lending
interest rate and the final consumption expenditure.

Ordinary Least-Squares Regression Model Development Using


Statsmodels
The section covers the most commonly used regression model, called the ordinary
least-squares model, which estimates the intercept and coefficients, while diminishing
residuals (refer back to Equation 2-1).
Listing 2-12 converts the data to the required format. It begins by constructing an x
array (the data points of the U.S. lending interest rate) and a y array (the data points of
the U.S. final consumption expenditure in current U.S. dollars).
It then shapes the data so that the ordinary least-squares regression model better
studies the data, then splits the data into training and test data, by applying the
train_test_split() method. Lastly, it standardizes the data in such a way that
the mean data point is 0 and the standard deviation is 1; it does this by applying the
StandardScaler() method.
To test the claim that the U.S. lending interest rate influences the U.S. final
consumption expenditure, you find the p-value to determine the significance of the
relationship. In addition, you determine how the ordinary least-squares model
expresses the amount of information it lost when estimating the future values of the
final consumption expenditure. To further assess the chosen model’s performance, you
must estimate the R2 score.
Table 2-7 outlines the type of estimator applied to test the significance of the
relationship between the U.S. lending interest rate and the final U.S. consumption
expenditure. It also shows how the model learns the available U.S. macroeconomic
data, and how it generates future instances of the final U.S. consumption expenditure,
including the errors it makes when estimating those instances. In addition, it details
the extent to which the model explains how changes in the U.S. lending interest rate
influence changes in the final U.S. consumption expenditure. In summary, it helps you
decide whether you must accept the existence of the established macroeconomic
phenomenon, including the degree to which you can rely on the model to estimate
future instances.

import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
x = np.array(df["lending_rate"])
y = np.array(df["final_consumption"])
x = x.reshape(-1,1)
y = y.reshape(-1,1)
x_train, x_test, y_train, y_test =
train_test_split(x,y,test_size=0.2,shuffle=False)
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
x_constant = sm.add_constant(x_train)
x_test = sm.add_constant(x_test)
model = sm.OLS(y_train,x_constant).fit()
model.summary()
Listing 2-12 Ordinary Least-Squares Regression Model Development Applying Statsmodels

Table 2-7 Ordinary Least-Squares Regression Model Results


Dep. Variable y R-squared 0.443
Model OLS Adj. R-squared 0.431
Method Least Squares F-Statistic 36.64
Date Wed, 04 Aug 2021 Prob (F-statistic) 2.41e-07
Time 11:06:05 Log-Likelihood -1454.2
No. Observations 48 AIC 2912.
Df Residual 46 BIC 2916.
Df Model 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 7.427e+12 5.12e+11 14.496 0.000 6.4e+12 8.46e+12
x1 -3.102e+12 5.12e+11 -6.053 0.000 -4.13e+12 -2.07e+12
Omnibus: 1.135 Durbin-Watson: 1.754
Prob(Omnibus): 0.567 Jarque-Bera (JB): 1.111
Skew: 0.236 Prob(JB): 0.574
Kurtosis: 2.424 Cond. No. 1.00
Table 2-7 shows that the ordinary least-squares regression model explains 53.2%
of the variability in the data (the R2 score is 0.443). Using findings from this table,
Equation 2-8 shows how changes in the U.S. lending interest rate influence changes in
the final U.S. consumption expenditure.
(Equation 2-8)
Equation 2-8 states that, for each unit change in the U.S. lending interest rate, the
final consumption expenditure decreases by 3.102e+12. In addition, Table 2-7 reveals
that the mean value of the predicted U.S. final consumption expenditure—when you
hold the U.S. lending interest rate constant—is 7.427e+12. Last, the table tells you that
the model explains about 36% of the variability in the data—it struggles to predict
future instances of the U.S. final consumption expenditure.

Ordinary Least-Squares Regression Model Development Using


Scikit-Learn
The preceding section revealed a way to develop and evaluate the ordinary least-
squares model using the statsmodels library . Findings show that the expressed
macroeconomic phenomenon does indeed exist. You cannot fully rely on the model to
predict future instances of the U.S. final consumption expenditure, given the poor
performance obtained in the test.
This section studies the same phenomenon, but applies a different machine
learning library to develop and assess the model. It uses a popular open source library
called scikit-learn . Developing this model is similar to the way it was done in the
previous section. This approach has a competitive edge over statsmodels because it
provides an easier way to control how the model behaves and to validate its
performance. In addition, it has a wide range of regression models, i.e., Ridge, Lasso,
and ElasticNet, among others. This chapter does not cover those models. All you need
to know for now is that these models extend the ordinary least-squares model by
introducing a term that penalizes the model for errors it makes in the test. The
primary difference between statsmodels and sklearn is that statsmodels is
better at hypothesis testing and sklearn is better at predictions and does not return
t-statistics. See Listing 2-13.

x = np.array(df["lending_rate"])
y = np.array(df["final_consumption"])
x = x.reshape(-1,1)
y = y.reshape(-1,1)
x_train, x_test, y_train, y_test =
train_test_split(x,y,test_size=0.2, random_state=0)
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(x_train,y_train)
Listing 2-13 Ordinary Least-Squares Regression Model Development Applying Scikit-Learn

Cross-Validation
Listing 2-14 applies the cross_val_score() method to validate the performance
of the default ordinary least-squares regression model over different subsets of the
same data. It applies R2 to find a score, since sklearn does not calculate the adjusted
R2. It then estimates the mean and standard deviation of the validation score.

from sklearn.model_selection import cross_val_score


def get_val_score(scores):
lm_scores = cross_val_score(scores, x_train, y_train,
scoring="r2")
print("CV mean: ", np.mean(scores))
print("CV std: ", np.std(scores))
get_val_score(lm)
CV mean: 0.2424291076433461
CV std: 0.11822093588298
Listing 2-14 Ordinary Least-Squares Regression Model Cross-Validation
Another random document with
no related content on Scribd:
The Council voted unanimously, 11 to 0, to recommend that
Monolithia be admitted. The Council adjourned and Nils Nilsen called
the General Assembly into extraordinary session for that afternoon.
The Assembly met at 3 P.M., unanimously voted Monolithia in, then
adjourned until its regular September session.
The UN thus became an interplanetary organization, with Monolithia
pledged to uphold its peaceful humanitarian aims.
It had been an easy story for the desk to handle and we had it all
wrapped up before my relief came in.
Then Riddie called and said the aliens had scheduled their first press
conference for 6 P.M. in her suite at the Waldorf. I asked John Hyatt
if he wanted me to cover it.
"I don't think you need bother, Sam," he said. "The reinforcements
from Chicago arrived this morning. We'll send Red Melville and a
couple of his juniors to help Reb. They haven't had a big story since
the Chicago fire."
"Okay, John. I'd as soon watch it on television."
As I was driving home from the end of the bus line I heard on the car
radio that Congress had voted to give the Monolithians the freedom
of the United States. The Senate, reassured by the aliens'
acceptance of the principles of the UN, had originated the bill and
the House immediately shouted it through. The President signed it
on his return from New York, saying it gave him great pleasure
inasmuch as it granted the visitors from space all the rights and
privileges of U.S. citizenship.
Mae fixed us an armchair buffet and we ate while we watched the
press conference on TV.
There was the usual milling around at the start. I saw Reb Sylvester,
putting in overtime, and Red Melville and a few other reporters I
recognized from the wire services and papers. Eurydice Playfair and
two of the aliens sat at a table on which was a cluster of
microphones. An announcer for the television network was
describing what we were seeing and giving us background
information we already knew.
"I see your friend Eurydice is doing all right for herself," Mae said, full
of those overtones a wife has for any female in the office who is
under sixty.
"Mn," I said. "She quit us. Shh."
The television announcer made some introductory remarks, then
Riddie made some and introduced one of the aliens (who were
wearing their Young Men's suits) as Mr. Reev. She spelled the name.
"Are there any questions?" she asked, and there was a roar of
laughter as dozens of hands shot up.
Reev, smiling, indicated the AP man, who asked where exactly
Monolithia was.
Reev began an involved answer which Riddie interrupted, saying a
fact sheet containing technical data would be distributed after the
press conference.
The UPI man asked what Reev's exact title was.
"Permanent representative to the United Nations from Monolithia,"
he replied.
WW's Reb Sylvester, apparently referring to Stew Macon's piece on
the definition of monolithic, noted that this referred to any massive
homogeneous whole, such as a state or an organization. "Is there
any significance to this term," he asked, "which as you know has
been applied in the past to the government of the Soviet Union?"
"We have no connection with the Soviet Union," Reev said, "except
those we have established with it and more than eighty other
countries through the United Nations." He added with a smile: "As
your definitions note, we are homogenized, like your milk."
Amid laughter, Reb asked: "Would you describe your government as
a democratic one?"
"Utterly," Reev said. There was no trace of accent in his speech. The
clicking Ian McEachern had noticed in the voice of the Monolithian
he had spoken to on the phone at Burning Tree was entirely absent,
as if they had perfected their study of English.
A Canadian Press man noted that "reeve" is a term his country uses
for the president of a village or town council, and asked if there was
any significance in the fact that Reev's name was almost identical.
Reev looked baffled, but Riddie said it was merely coincidence. The
CP man then asked the name of the other Monolithian.
"Jain," he said, spelling it. He added with a smile: "No significance;
it's just a name."
A man from Reuters asked if the Monolithians were aware that
President Allison had signed a bill making them honorary citizens of
the United States.
"We are grateful for that," Reev said. "But I think you will find that the
new law bestows full, not honorary, citizenship."
"Are you prepared to live up to the laws of the United States?" an
unidentified reporter asked.
"Fully," Reev replied. "And to those of the United Nations. Not only to
the letter but to the spirit of the law."
"Do you believe those laws to be fair?"
"The Constitution of the United States and the Charter of the United
Nations appear to be the most perfect examples of humanitarian
principles we have encountered anywhere in our travels," Reev said.
Jain nodded agreement.
"How many of you are there?" an NBC man asked.
"Twelve of us here in New York," Reev said. "An equal number at
Burning Tree. Then, of course, there are hundreds of us in each of
the mother ships."
"The mother ships? What are they?"
"The craft which actually made the interstellar journey. They are
moored, as some of your press accounts have indicated, outside
your atmosphere."
"How far out?"
"As far out as the Moon, but exactly opposite it, on the other side of
the Earth."
"Is that why we had no warning of your approach from the
observatory on the Moon?"
"Probably. That's the way we planned it."
"Are those ships armed?"
"All our craft are armed, but only with defensive weapons. We travel
only in peace and molest only those who would molest us."
There was a stir among the reporters, and a man from Missiles and
Rockets Magazine asked for a description of the weapons.
"I am afraid that is what you would term classified information. Our
representatives and those of your Defense Department and the
United Nations Secretariat have scheduled a meeting to discuss
possible exchange of information touching on these weapons—
which I emphasize are strictly defensive in nature."
"Are you armed personally?" the AP man asked.
"We do not carry concealed weapons, if that's what you mean," Reev
said. "But we are capable of defending ourselves against any attack
on ourselves or our friends. I don't wish to sound ungrateful, but the
elaborate security guard provided for us and your President when he
traveled with us was quite superfluous."
"I don't suppose," one of the television men said, "you'd be willing to
give us a demonstration of your defense weapon? It'd make quite a
graphic picture for the television audience."
"We'd be glad to," Reev said, "if you could suggest a way."
"Well," the TV man said, looking around the room, "I could throw one
of these big, glass Waldorf ashtrays at you...."
"Now wait a minute," one of the other reporters said. "We don't want
glass shrapnel flying around the room."
"No one will be hurt," Reev said. "You will see. I suggest you focus
one of your cameras here." He indicated a spot about two feet in
front of his face.
"Okay," the TV man said. He picked up the ashtray, which was a
good eight inches in diameter, and hefted it.
"Empty the cigarette butts first, at least," someone said.
The TV man asked Reev: "Ready?"
"Ready."
The TV man let fly. The heavy ashtray sailed directly at Reev's face.
About a foot from it, the ashtray appeared to hit, or sink into, an
invisible shield. It did not shatter, but seemed to fuse, increasing its
diameter but decreasing its thickness. It became the size of a pizza
platter, but exceedingly thin. It continued to grow in diameter,
becoming fainter and fainter. Then it disappeared completely.
When the hubbub of amazed comment had subsided Reev smiled
and said: "Nobody hurt, I trust?"
One of the reporters in the first row said, "No, but I'd swear it's quite
a bit hotter up here."
"True. That usually happens when matter is changed into energy."
"Can you be more explicit?"
"Sorry, no."
"Would you have the same protection against a bullet?"
"Entirely the same."
"How much warning do you need to put it into effect?"
"None. It takes effect at the first sight of danger."
"Would it work against a bomb?"
"Yes."
"A hydrogen bomb?"
"Yes."
"Then you're invulnerable."
"Completely."
"Well," a reporter said, "I'm awfully glad you're on our side."
A ripple of nervous laughter went through the crowded room.
5 (JULY 26, SAT.)
It's too hot in New York, or else it's too cold. But hot or cold—
somebody's always pushing you.
—Joe Frisco

Saturday was my day off and Mae and I drove into New York. We
had tickets to a matinee. I switched on the car radio.
"Get some music," Mae said.
"I want to see what's on the news."
"Can't you ever relax?"
"I'm relaxed. I don't have to do anything except listen."
"Promise me you won't go into the office," Mae said. "I want to see
that play."
"I have no intention of going to the office," I said. "Not unless there's
an earthshaker."
"That's what I mean. Let somebody else handle it for a change.
You're not the only man who can do the job."
"Listen," I said. "Here's something."
A commentator on one of the independent stations was saying the
Monolithians apparently had made a number of secret agreements
with the United States and the United Nations. The American public
was being kept in the dark about many things they had a right to
know. It was obvious from the alien's press conference yesterday
that they were being more frank with the public than the people's
own government officials. The defense-weapon demonstration to the
nation on television was only one example.
I recognized the voice, which continued on a note of agitation:
"Here is a bulletin just handed to me. A Monolithian spokesman
disclosed today that the first two-dozen aliens who landed on Earth
have been joined by at least two hundred—I repeat, at least two
hundred—more.
"This disclosure was made in answer to a question, reinforcing this
commentator's belief that our own government is keeping us in the
dark about matters of which we have every right to know the true
facts."
"As opposed to the false facts?" I muttered, my copy-reader's
instincts affronted.
"Shh," Mae said. "Listen!"
"The Monolithians, on the other hand, appear to be willing to answer
almost any nonscientific question put to them, giving at least the
appearance of candor which our own officials so sadly lack," the
commentator went on.
"The question then arises whether it would be truer to say that our
government is allied with the aliens, as our officials claim, or whether
it is collaborating with them, having capitulated to their unknown
military strength in a sort of interplanetary Munich."
Mae gasped.
"Clearly it is the aliens who are acting with confidence, publicizing
their movements, while the U.S. government shows a curious
unwillingness to keep its own people—you and me—informed. Can it
be that the government itself is in the dark about these vitally
important matters? Can it be that our own government is acting as
the tool of the aliens, having secretly surrendered to a power the like
of which this Earth has never known?"
Mae had been listening in mounting alarm. "Do you think he's right?"
she asked me. "Is it possible?"
"That's old Clyde Fitchburn, the noted viewer with alarm," I told her.
"Don't take him too seriously."
"He can't be making it all up," she said. "Can he?"
"Only about 99 percent of it," I said. "He still hasn't got back to his
one little true fact—that two hundred more aliens have landed."
I switched to another station.
"... playing host today to nearly ten times as many aliens as originally
landed on Earth," an announcer on one of the network stations was
saying.
"Now listen," I said to Mae. "This is news, not an editorial."
"A Monolithian spokesman said the new arrivals—two hundred of
them, all male—had landed in a second scout ship, at about
midnight, in Central Park, at the northern end of the reservoir.
"The spokesman said in a statement, quote, 'The second contingent
arrived in response to the invitation implicit in the law signed
yesterday giving the Monolithians U.S. citizenship.' Unquote.
"At nine o'clock this morning, when the stores opened, the
Monolithians arrived in a fleet of taxicabs in the midtown area, where
they went in separate groups to the different men's clothing stores—
Bond, Howard, Ripley, Rogers Peet and Brooks Brothers—and to
the men's departments of such department stores as Stern's,
Gimbels and Macy's. Here they outfitted themselves in Earth-style
clothing, which they charged to the Monolithian Embassy, and left by
foot, mingling with the crowds on the sidewalk.
"Dressed like typical New Yorkers, most of them virtually
disappeared—that is, they lost their identity as aliens and became
indistinguishable from the average male New Yorker.
"The Monolithian spokesman said in answer to a question that their
purpose was that of any visitor to New York—to see the sights of the
city and become acquainted with its customs."
"There," I said to Mae. "That doesn't sound quite as bad as Fire-
Eater Fitchburn's account, does it?"
My wife seemed relieved, but she wouldn't admit it. "They're
probably playing it down," she said.
The newscaster said, "Reporters were late on the scene, but if eye
witness accounts of passersby are to be believed, the aliens split up
into groups of two or three and visited such places as Woolworth's,
book stores, movie houses, the Empire State Building, the
Planetarium, and took rides on buses and subways."
Mae said, "I'm not sure I'd like it if one of them sat next to us at the
play."
"How would you tell?" I asked her.
"I'd know," she said. "Somehow. I'm sure I would."
"Well," I said, "you let me know and we'll interview him at
intermission."
We crossed the George Washington Bridge, went down the West
Side Highway and found a place to park on Sixth Avenue in the
upper thirties. We had half an hour before curtain time and I asked
Mae if she would like a drink.
"I think I would," she said. "I seem to have a slight case of the jitters."
We found a quiet place about a block from the theater and sat at the
bar in the air-conditioned dimness. I had a Scotch and soda and Mae
had a gin and tonic.
"Had any aliens for customers?" I asked the bartender as I paid for
the drinks.
"Not so's I noticed," he said. "At least nobody tried to charge it to the
Monolithian Embassy. We got a strictly cash trade here."
He went to serve another customer and a well-dressed young man
came in and sat down on the vacant stool next to Mae.
"Sam," she whispered, nudging me.
"What?"
"Here's one."
"Where?"
"Right next to me," she whispered. "Look at his clothes. They're
brand new."
The bartender went to the new arrival and said, "What'll it be?"
"What do you have?" Mae's neighbor asked.
"Anything you want," the bartender said. "Whiskey, bourbon, Scotch,
gin, vodka. Soda, ginger ale, Seven-up. The combinations are
limitless."
"I'll have a Scotch and Seven-Up," the stranger said.
The bartender didn't blink an eye. "Yes, sir," he said, and proceeded
to blend the two strange ingredients.
"Scotch and Seven-Up!" Mae said to me. "He must be one of them.
Who ever heard of such a thing?"
"That's pretty circumstantial evidence," I said.
"Change seats with me, Sam," she said. "I'm getting nervous again."
"Okay," I said. "Want another drink?"
"Definitely." She swallowed the rest of her first one as she slid onto
my stool.
"Two more of the same," I told the bartender.
"Coming up," he said. "Right after this Scotch and Seven-Up." He
gave me a shrug.
"Say something to him," Mae whispered, meaning my new neighbor
at the bar.
"Like what? Shall I ask him what he thinks of American women?"
"You're the newsman," she said. "You ought to know what to ask
him."
"This is my day off," I reminded her.
"Go on. Ask him."
"Okay."
I waited till his concoction had been served to him, then said:
"Pretty good drink, Scotch and Seven-Up."
He looked at me in what seemed to be embarrassment. "I don't
know, really," he said. "First time I ever had it."
"Stranger in town?"
"Yes, as a matter of fact. Got in only last night."
"Where from?"
"You wouldn't have heard of the place," he said.
("See! I told you!" Mae whispered.)
"I don't know," I said. "I've heard of lots of places: Medicine Hat,
Ephrata, Chestnut Bend, Gallipolis, Moses Lake, Lackawack...."
"None of those," he said, as if he were playing a quiz game. "It's a
little place in Missouri called Joplin."
"That's easy. I got my Signal Corps training near there during the
war."
"You don't say!"
("Ask him where he got the new suit," Mae persisted.)
"Where'd you get the new suit?" I asked him.
"Bond's," he said. "You know, under the waterfall in Times Square? It
looked so cool. They have an artificial waterfall on top of the building.
It used to be Pepsi-Cola's."
("Ask him what time," Mae said.)
"What time?"
"About nine o'clock," he said. "When it opened. Why?"
("Why?" I asked Mae.)
("Ask him if he saw the aliens in there then.")
"Did you see the aliens in there then?"
"I saw a bunch of men come in in bearskins or something like," he
said. "I thought it was an advertising stunt."
("He thought it was an advertising stunt," I told Mae.)
("Doesn't he listen to the radio?" she asked.)
"Don't you listen to the radio?" I asked him.
"The radio?"
"The aliens from Monolithia were getting outfitted in Bond's at nine
A.M., according to the radio," I told him without benefit of Mae.
"Is that who they were? Well, well."
He drank his Scotch and Seven-Up at one gulp, making a face over
it, and said, "I've got to get going. I have a ticket for a show at 2:30."
("What show?" Mae asked.)
"What show?" I asked him.
He mentioned the new Rodgers & Hammerstein musical. "I'm
meeting my wife there. Would you like to see a picture of her and the
kids?" He took out his wallet to show me. In addition to the snapshot
I saw his Missouri driver's license and an old draft card.
"Nice-looking family," I said.
"Thanks. Got to run now. My wife has the other ticket and I'm
meeting her at the seats. Can't get lost that way, I figure. Pleasure
talking to you. You, too, ma'am."
He left and I said to Mae: "Well?"
"Well what?"
"Are you satisfied he's not an alien?"
"I don't know. How come he's wearing his new suit the same day he
bought it? You always have to wait a week or ten days for
alterations."
"Maybe he didn't need any alterations and they cuffed the pants
while he waited. At least he won't be sitting next to you in the
theater."
"How did he get tickets to that? Are you sure you couldn't do any
better than the revival of Where's Charley?"
"Not on short notice. He probably paid scalper's prices on the
expense account. We'd better start."
We left the bar.
"I guess he won't be," Mae said, backing up the conversation in the
way she has. "But for my nerves' sake there'd better not be another
man in a new suit sitting next to me, even if he has got a good
explanation."
"The odds are against it," I said as we stood at the corner of 44th
Street and Broadway and waited at the Don't Walk sign. "Just divide
two hundred into several million."
The Walk sign flashed on. We were in a group of about fifteen law-
abiding pedestrians who started across the street. We had almost
reached the other side when somebody yelled, "Look out!"
A big long convertible with a grinning idiot behind the wheel was not
only failing to yield the right of way to pedestrians but was making an
illegal right turn onto Broadway from the cross street.
I grabbed Mae and hauled her ahead to the curb.
"Damn fool!" I hollered at the driver, who kept on going, blowing his
horn.
Everybody scrambled to safety except one young man who hadn't
seen or heard, or else had supreme faith in his rights as a
pedestrian. The convertible was heading straight at him.
"He'll get hit!" somebody yelled. A traffic cop blew his whistle. A
woman screamed. Mae, unable to look, buried her face in my
shoulder. The pedestrian never broke his casual stride.
The massive chromed bumper was only inches from him when it
began to disintegrate.
First the bumper, then the grille and the oversized fender, then the
right front tire dissolved in a shimmering film.
As the tire disappeared, the momentum of the car sent it ahead into
what was obviously the protective shield surrounding one of the
aliens.
More of the car vanished and it came to a grinding stop, its
underside providing the brake as it plowed into the asphalt.
The front of the car, almost clear back to the windshield, simply
wasn't there any more. The driver's idiot grin had changed to a look
of unbelieving dismay as he stared at the nothingness where his
hood used to be.
The young man, who I now saw was wearing a new suit, stepped
onto the curb near Mae and me. He paused, looked back for just a
moment at the remains of the convertible, and said, as if quoting, "A
driver must yield the right of way to a pedestrian crossing with a
Walk signal," then lost himself in the crowd.
6 (JULY 27, SUN.)
ALIEN, n. An American sovereign in his probationary state.
—Ambrose Bierce

It's pretty complicated to explain why a person who lives in New York
State, as I do, has to go through New Jersey to get home from his
office in New York City. It has to do with (1) the way New York's
border slopes northwest from the city and (2) a straight line being the
shortest distance between two points. People who half-grasp these
phenomena remain convinced that my village, High Tor, N.Y., is a
short drive from any old place in New Jersey.
John Hyatt, demon respecter of facts though he ordinarily is, was
one of those so deluded when he called me on the telephone on
Sunday morning and asked me if I'd mind taking a run over to Middle
Valley, N.J.
"I'm aware it's your day off, Sam," John said, "but this is practically
on your doorstep and I know you'd feel hurt if we didn't ask you to
cover it personally."
This, of course, was the well-known malarky, but I told him, "I'm the
original busman, John, but maybe you'd better fill me in. Just what is
going on in Middle Valley, of all places?"
"It's these damn aliens, Sam. Incidentally, I want to thank you for
phoning in that eyewitnesser yesterday on the jaywalker. I hear you
missed the first act of the play on account of it, but it was a damn
fine story and we appreciate it."
It had been a jaydriver, not a jaywalker, but I didn't correct him.
"Think nothing of it, John. It'll all show up on my overtime slip."
He laughed. Not without pain, it seemed to me. World Wide is in a
perennial economy drive and the word overtime is not one you use
lightly in the business office. "We never boggle where a good story is
concerned," John said. "You know that. And this Middle Valley thing
—well, you're aware, I'm sure, that they've got this local blue law
banning Sunday employment...."
Middle Valley, N.J., is a good hour's drive from High Tor, N.Y. It's less
than twenty minutes via the Lincoln Tunnel from New York City, but I
knew John would think I was being uncooperative if I mentioned it. I
didn't argue with him. I told Mae I was on overtime, got in the
Volkswagen and went.

New Jersey passed a law some years ago aimed at forcing Sunday
closing on a group of merchants who sold used cars and major
appliances in a string of roadside stores along well-traveled Route
17, which runs between New York City and the Catskill Mountain
resorts. The idea was to protect the community merchant from this
competition so he could have a day off. But the legislation was too
broad and bogged down in courts. Its opponents charged, among
other things, that it was discriminatory. What about the Jewish
merchant, they asked, who religiously closed his place of business
on Saturday, his Sabbath? Was he to be penalized by having to
close on two days a week, while the Christian merchant closed only
on one?
While the state law was being appealed, its opponents obtained an
injunction and Sunday business continued. Some communities who
had liked the state law during the brief time it was being enforced
then passed local ordinances. Middle Valley was one such
community with its own Sunday closing law.
Middle Valley is a residential, fairly well-to-do, predominantly
Christian village of about 3,000 people. It has few stores, most of its
residents doing their shopping in nearby towns. It does, however,
have a drug store, a delicatessen, a gas station, a newsstand and a
local milkman. The village fathers decreed that the strict law meant
all these must close on Sunday.
No one objected except the druggist, the delicatessen owner (who
had closed on Saturday for years), the newsdealer and the milkman.
The citizens of Middle Valley found it not too inconvenient to order
extra milk on Saturday to tide them over the week end, and they
rather enjoyed driving a couple of miles to pick up the Sunday
papers. It was a mark of distinction to live in the village that permitted
no paid Sunday employment.
"Middle Valley's shut up tighter'n a drum today," the well-to-do, car-
owning, Christian citizen could remark with pride as he paid for his
paper across the village line.
The few who didn't own cars had to walk as far as two miles to catch
the buses whose drivers were not allowed to stop in Middle Valley.
No one asked them if they enjoyed their walks, especially on rainy
Sundays.
Some of this I knew and some John Hyatt filled me in on. I learned a
lot more after I got there, first having checked my gas to be sure I
wouldn't be marooned there till Monday.
I parked near the center of town, in front of the delicatessen. Down
the block were the newsdealer's, the drug store and a couple of real
estate-and-insurance offices. All were closed.
I introduced myself to the man standing in front of the delicatessen.
He told me his name:
"Simon Dorfman. This is my store. I closed it Friday at sundown.
Religious reasons. I can't open today. Monkey business reasons. I'm
thinking of opening today regardless. I'm considering it this minute.
But I'm also considering ninety days in jail and $200 fine."
"Who would arrest you if you opened?" I asked him.
"Who? The cops. Who else?"
"Middle Valley police?"
"Joe Lyman and Fred Moffat. I've known them since they were boys.
But they'd arrest me. They said so. It's not their fault."
"Then who would arrest them?" I asked Dorfman.
"What do you mean arrest them?"
"Aren't they paid employees? If you can't work on Sunday, how can
they?"
He thought that over. "What's sauce for the goose, eh?"
"Why not?"
"But you're a reporter. You don't care if I get arrested as long as you
get a story. Maybe I'll talk it over with my friend Hirsch the druggist."
"Let me know what you decide, Mr. Dorfman," I said. "I'll be around."
"Good. But listen. You want a real story? Go down two blocks that
way and one to your left."
"What's there?"
"The Middle Valley Congregational Church. I don't want you to think
I'm laughing at somebody else's religion, because I don't do that, but
go down and see for yourself. Those men are there—from the
spaceship. An interesting situation."
So that's where they were. I left him saying to himself, "Sauce for the
gander. Why not?"
John Hyatt had said some Monolithians were in Middle Valley but he
didn't know why. He imagined they were sight-seeing and he
obviously hoped for something better. I was beginning to have the
same hunch he must have had.
There was a crowd of about a hundred outside the Congregational
Church. Most of the people appeared to be parishioners—well-
dressed, upper-middle-class men and women. Their late-model cars
were parked along the tree-shaded street. I squeezed my
Volkswagen in among them.
A separate group of well-dressed people—all young men—stood
outside the main entrance of the ivy-covered stone church. The
minister was with them, talking heatedly. I made my way through
gaps in the crowd of parishioners, who seemed anxious for some
settlement to be reached but unwilling to become involved.
"... blasphemy," the minister was saying. His name, according to the
outside bulletin board, was the Rev. James Lonsway Marchell.
"Not at all, Mr. Marchell," one of the young men said. "It's merely a
question of law."
"God's law has called my flock to worship. Man's law shall not keep
them from their devotions."
"Certainly not," the young man said. He was speaking fluent,
unaccented English. "We have no quarrel with their wish to honor
their deity in whatever way they choose. But you, Mr. Marchell, as a
paid employee of this church, may not, under law, work on Sunday."
"Work!" the minister exclaimed. "It is the Lord's work I do!"
"But for a salary paid by men. You have admitted that to be a fact."
"By what right—" the minister said—"by what abrogation of authority
do you come from millions of miles away to interfere in the affairs of
this quiet, respectable, law-abiding village?"
"The very fact, sir, that you have chosen not to abide by the law has
brought us here," the leader of the Monolithian group said. "We have
solemnly sworn to uphold the laws of this country, and therefore the
laws of each of its parts. We should be shirking our obligations to our
adopted nation if we did less."
"You pervert the law—you mock it. You are heretics. Worse, you are
the devil's henchmen. I have tried long enough to reason with you.
Now stand aside. Again I tell you—I mean to enter my church!"
The minister started for the door but one of the Monolithians was
there ahead of him. I was half afraid I was going to see Marchell start
to disappear, but obviously the aliens had a variation on their
protective weapon. The minister wanted to enter his church, not to
harm anyone, and the shield took the form of a pliant, invisible wall
that prevented Marchell from even hurting himself as he walked into
it, apparently for the second time at least.

You might also like