Immediate Download Reinforcement Learning For Optimal Feedback Control Rushikesh Kamalapurkar Ebooks 2024

Full download text book at textbookfull.
com
Reinforcement Learning for Optimal

Feedback Control Rushikesh Kamalapurkar
DOWLOAD HERE
https://textbookfull.com/product/reinforcement-
learning-for-optimal-feedback-control-rushikesh-
kamalapurkar/
DOWLOAD NOW
Download more textbook from textbookfull.com

More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Reinforcement learning and Optimal Control Draft

version 1st Edition Dmitri Bertsekas
https://textbookfull.com/product/reinforcement-learning-and-
optimal-control-draft-version-1st-edition-dmitri-bertsekas/
Intelligent Optimal Adaptive Control for Mechatronic

Systems 1st Edition Marcin Szuster
https://textbookfull.com/product/intelligent-optimal-adaptive-
control-for-mechatronic-systems-1st-edition-marcin-szuster/
Networked control systems with intermittent feedback

1st Edition Hirche
https://textbookfull.com/product/networked-control-systems-with-
intermittent-feedback-1st-edition-hirche/
Housing Fit For Purpose: Performance, Feedback and

Learning 1st Edition Fionn Stevenson
https://textbookfull.com/product/housing-fit-for-purpose-
performance-feedback-and-learning-1st-edition-fionn-stevenson/
Grokking Deep Reinforcement Learning First Edition
Miguel Morales
https://textbookfull.com/product/grokking-deep-reinforcement-
learning-first-edition-miguel-morales/
Deep Reinforcement Learning in Action 1st Edition

Alexander Zai
https://textbookfull.com/product/deep-reinforcement-learning-in-
action-1st-edition-alexander-zai/
Optimal Control in Thermal Engineering 1st Edition

Viorel Badescu
https://textbookfull.com/product/optimal-control-in-thermal-
engineering-1st-edition-viorel-badescu/
Analog Automation and Digital Feedback Control

Techniques 1st Edition Jean Mbihi
https://textbookfull.com/product/analog-automation-and-digital-
feedback-control-techniques-1st-edition-jean-mbihi/
Reinforcement Learning An Introduction Adaptive

Computation and Machine Learning series Second Edition
Sutton
https://textbookfull.com/product/reinforcement-learning-an-
introduction-adaptive-computation-and-machine-learning-series-
second-edition-sutton/
Communications and Control Engineering
Rushikesh Kamalapurkar
Patrick Walters · Joel Rosenfeld
Warren Dixon
Reinforcement
Learning for
Optimal Feedback
Control
A Lyapunov-Based Approach
Series editors
Alberto Isidori, Roma, Italy
Jan H. van Schuppen, Amsterdam, The Netherlands
Eduardo D. Sontag, Boston, USA
Miroslav Krstic, La Jolla, USA
Communications and Control Engineering is a high-level academic monograph
series publishing research in control and systems theory, control engineering and
communications. It has worldwide distribution to engineers, researchers, educators
(several of the titles in this series find use as advanced textbooks although that is not
their primary purpose), and libraries.
The series reflects the major technological and mathematical advances that have
a great impact in the fields of communication and control. The range of areas to
which control and systems theory is applied is broadening rapidly with particular
growth being noticeable in the fields of finance and biologically-inspired control.
Books in this series generally pull together many related research threads in more
mature areas of the subject than the highly-specialised volumes of Lecture Notes in
Control and Information Sciences. This series’s mathematical and control-theoretic
emphasis is complemented by Advances in Industrial Control which provides a
much more applied, engineering-oriented outlook.
Publishing Ethics: Researchers should conduct their research from research
proposal to publication in line with best practices and codes of conduct of relevant
professional bodies and/or national and international regulatory bodies. For more
details on individual ethics matters please see:
https://www.springer.com/gp/authors-editors/journal-author/journal-author-help-
desk/publishing-ethics/14214.
More information about this series at http://www.springer.com/series/61

Rushikesh Kamalapurkar Patrick Walters
•
Joel Rosenfeld Warren Dixon

•
Reinforcement Learning
for Optimal Feedback
Control
A Lyapunov-Based Approach
123
Rushikesh Kamalapurkar Joel Rosenfeld
Mechanical and Aerospace Engineering Electrical Engineering
Oklahoma State University Vanderbilt University
Stillwater, OK Nashville, TN
USA USA
Patrick Walters Warren Dixon

Naval Surface Warfare Center Department of Mechanical
Panama City, FL and Aerospace Engineering
USA University of Florida
Gainesville, FL
USA
ISSN 0178-5354 ISSN 2197-7119 (electronic)

ISBN 978-3-319-78383-3 ISBN 978-3-319-78384-0 (eBook)
https://doi.org/10.1007/978-3-319-78384-0
Library of Congress Control Number: 2018936639
MATLAB® and Simulink® are registered trademarks of The MathWorks, Inc., 1 Apple Hill Drive,
Natick, MA 01760-2098, USA, http://www.mathworks.com.
Mathematics Subject Classification (2010): 49-XX, 34-XX, 46-XX, 65-XX, 68-XX, 90-XX, 91-XX,
93-XX
© Springer International Publishing AG 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my nurturing grandmother, Mangala
Vasant Kamalapurkar.
—Rushikesh Kamalapurkar
To my strong and caring grandparents.

—Patrick Walters
To my wife, Laura Forest Gruss Rosenfeld,

with whom I have set out on the greatest
journey of my life.
—Joel Rosenfeld
To my beautiful son, Isaac Nathaniel Dixon.

—Warren Dixon
Preface
Making the best possible decision according to some desired set of criteria is always
difficult. Such decisions are even more difficult when there are time constraints and
can be impossible when there is uncertainty in the system model. Yet, the ability to
make such decisions can enable higher levels of autonomy in robotic systems and,
as a result, have dramatic impacts on society. Given this motivation, various
mathematical theories have been developed related to concepts such as optimality,
feedback control, and adaptation/learning. This book describes how such theories
can be used to develop optimal (i.e., the best possible) controllers/policies (i.e., the
decision) for a particular class of problems. Specifically, this book is focused on the
development of concurrent, real-time learning and execution of approximate opti-
mal policies for infinite-horizon optimal control problems for continuous-time
deterministic uncertain nonlinear systems.
The developed approximate optimal controllers are based on reinforcement
learning-based solutions, where learning occurs through an actor–critic-based
reward system. Detailed attention to control-theoretic concerns such as convergence
and stability differentiates this book from the large body of existing literature on
reinforcement learning. Moreover, both model-free and model-based methods are
developed. The model-based methods are motivated by the idea that a system can
be controlled better as more knowledge is available about the system. To account
for the uncertainty in the model, typical actor–critic reinforcement learning is
augmented with unique model identification methods. The optimal policies in this
book are derived from dynamic programming methods; hence, they suffer from the
curse of dimensionality. To address the computational demands of such an
approach, a unique function approximation strategy is provided to significantly
reduce the number of required kernels along with parallel learning through novel
state extrapolation strategies.
The material is intended for readers that have a basic understanding of nonlinear
analysis tools such as Lyapunov-based methods. The development and results may
help to support educators, practitioners, and researchers with nonlinear
systems/control, optimal control, and intelligent/adaptive control interests working
in aerospace engineering, computer science, electrical engineering, industrial
vii
viii Preface
engineering, mechanical engineering, mathematics, and process engineering

disciplines/industries.
Chapter 1 provides a brief introduction to optimal control. Dynamic
programming-based solutions to optimal control problems are derived, and the
connections between the methods based on dynamic programming and the methods
based on the calculus of variations are discussed, along with necessary and suffi-
cient conditions for establishing an optimal value function. The chapter ends with a
brief survey of techniques to solve optimal control problems. Chapter 2 includes a
brief review of dynamic programming in continuous time and space. In particular,
traditional dynamic programming algorithms such as policy iteration, value itera-
tion, and actor–critic methods are presented in the context of continuous-time
optimal control. The role of the optimal value function as a Lyapunov function is
explained to facilitate online closed-loop optimal control. This chapter also high-
lights the problems and limitations of existing techniques, thereby motivating the
development in this book. The chapter concludes with some historic remarks and a
brief classification of the available dynamic programming techniques.
In Chap. 3, online adaptive reinforcement learning-based solutions are devel-
oped for infinite-horizon optimal control problems for continuous-time uncertain
nonlinear systems. A novel actor–critic–identifier structure is developed to
approximate the solution to the Hamilton–Jacobi–Bellman equation using three
neural network structures. The actor and the critic neural networks approximate the
optimal controller and the optimal value function, respectively, and a robust
dynamic neural network identifier asymptotically approximates the uncertain sys-
tem dynamics. An advantage of using the actor–critic–identifier architecture is that
learning by the actor, critic, and identifier is continuous and concurrent, without
requiring knowledge of system drift dynamics. Convergence is analyzed using
Lyapunov-based adaptive control methods. The developed actor–critic method is
extended to solve trajectory tracking problems under the assumption that the system
dynamics are completely known. The actor–critic–identifier architecture is also
extended to generate approximate feedback-Nash equilibrium solutions to N-player
nonzero-sum differential games. Simulation results are provided to demonstrate the
performance of the developed actor–critic–identifier method.
Chapter 4 introduces the use of an additional adaptation strategy called con-
current learning. Specifically, a concurrent learning-based implementation of
model-based reinforcement learning is used to solve approximate optimal control
problems online under a finite excitation condition. The development is based on
the observation that, given a model of the system, reinforcement learning can be
implemented by evaluating the Bellman error at any number of desired points in the
state space. By exploiting this observation, a concurrent learning-based parameter
identifier is developed to compensate for uncertainty in the parameters.
Convergence of the developed policy to a neighborhood of the optimal policy is
established using a Lyapunov-based analysis. Simulation results indicate that the
developed controller can be implemented to achieve fast online learning without the
addition of ad hoc probing signals as in Chap. 3. The developed model-based
reinforcement learning method is extended to solve trajectory tracking problems for
Preface ix
uncertain nonlinear systems and to generate approximate feedback-Nash equilib-

rium solutions to N-player nonzero-sum differential games.
Chapter 5 discusses the formulation and online approximate feedback-Nash
equilibrium solution for an optimal formation tracking problem. A relative control
error minimization technique is introduced to facilitate the formulation of a feasible
infinite-horizon total-cost differential graphical game. A dynamic programming-
based feedback-Nash equilibrium solution to the differential graphical game is
obtained via the development of a set of coupled Hamilton–Jacobi equations. The
developed approximate feedback-Nash equilibrium solution is analyzed using a
Lyapunov-based stability analysis to yield formation tracking in the presence of
uncertainties. In addition to control, this chapter also explores applications of dif-
ferential graphical games to monitoring the behavior of neighboring agents in a
network.
Chapter 6 focuses on applications of model-based reinforcement learning to
closed-loop control of autonomous vehicles. The first part of the chapter is devoted
to online approximation of the optimal station keeping strategy for a fully actuated
marine craft. The developed strategy is experimentally validated using an autono-
mous underwater vehicle, where the three degrees of freedom in the horizontal
plane are regulated. The second part of the chapter is devoted to online approxi-
mation of an infinite-horizon optimal path-following strategy for a unicycle-type
mobile robot. An approximate optimal guidance law is obtained through the
application of model-based reinforcement learning and concurrent learning-based
parameter estimation. Simulation results demonstrate that the developed method
learns an optimal controller which is approximately the same as an optimal con-
troller determined by an off-line numerical solver, and experimental results
demonstrate the ability of the controller to learn the approximate solution in real
time.
Motivated by computational issues arising in approximate dynamic program-
ming, a function approximation method is developed in Chap. 7 that aims to
approximate a function in a small neighborhood of a state that travels within a
compact set. The development is based on the theory of universal reproducing
kernel Hilbert spaces over the n-dimensional Euclidean space. Several theorems are
introduced that support the development of this State Following (StaF) method. In
particular, it is shown that there is a bound on the number of kernel functions
required for the maintenance of an accurate function approximation as a state moves
through a compact set. Additionally, a weight update law, based on gradient des-
cent, is introduced where good accuracy can be achieved provided the weight
update law is iterated at a high enough frequency. Simulation results are presented
that demonstrate the utility of the StaF methodology for the maintenance of accurate
function approximation as well as solving the infinite-horizon optimal regulation
problem. The results of the simulation indicate that fewer basis functions are
required to guarantee stability and approximate optimality than are required when a
global approximation approach is used.
x Preface
The authors would like to express their sincere appreciation to a number of

individuals whose support made the book possible. Numerous intellectual discus-
sions and research support were provided by all of our friends and colleagues in the
Nonlinear Controls and Robotics Laboratory at the University of Florida, with
particular thanks to Shubhendu Bhasin, Patryk Deptula, Huyen Dinh, Keith Dupree,
Nic Fischer, Marcus Johnson, Justin Klotz, and Anup Parikh. Inspiration and
insights for our work were provided, in part, through discussions with and/or
reading foundational literature by Bill Hager, Michael Jury, Paul Robinson, Frank
Lewis (the academic grandfather or great grandfather to several of the authors),
Derong Liu, Anil Rao, Kyriakos Vamvoudakis, Richard Vinter, Daniel Liberzon,
and Draguna Vrabie. The research strategies and breakthroughs described in this
book would also not have been possible without funding support provided from
research sponsors including: NSF award numbers 0901491 and 1509516, Office of
Naval Research Grants N00014-13-1-0151 and N00014-16-1-2091, Prioria
Robotics, and the Air Force Research Laboratory, Eglin AFB. Most importantly, we
are eternally thankful for our families who are unwavering in their love, support,
and understanding.
Stillwater, OK, USA Rushikesh Kamalapurkar

Panama City, FL, USA Patrick Walters
Nashville, TN, USA Joel Rosenfeld
Gainesville, FL, USA Warren Dixon
January 2018
Contents
1 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 The Bolza Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Necessary Conditions for Optimality . . . . . . . . . . . . . . . . 3
1.4.2 Sufficient Conditions for Optimality . . . . . . . . . . . . . . . . 5
1.5 The Unconstrained Affine-Quadratic Regulator . . . . . . . . . . . . . . . 5
1.6 Input Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Connections with Pontryagin’s Maximum Principle . . . . . . . . . . . 9
1.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8.1 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8.2 Differential Games and Equilibrium Solutions . . . . . . . . . 11
1.8.3 Viscosity Solutions and State Constraints . . . . . . . . . . . . 12
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Approximate Dynamic Programming . . . . . . . . . . . . . . . . . . . . . ... 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 17
2.2 Exact Dynamic Programming in Continuous Time and Space . ... 17
2.2.1 Exact Policy Iteration: Differential and Integral
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 18
2.2.2 Value Iteration and Associated Challenges . . . . . . . . . ... 22
2.3 Approximate Dynamic Programming in Continuous Time
and Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Some Remarks on Function Approximation . . . . . . . . . . . 23
2.3.2 Approximate Policy Iteration . . . . . . . . . . . . . . . . . . . . . 24
2.3.3 Development of Actor-Critic Methods . . . . . . . . . . . . . . . 25
2.3.4 Actor-Critic Methods in Continuous Time and Space . . . . 26
2.4 Optimal Control and Lyapunov Stability . . . . . . . . . . . . . . . . . . . 26
xi
xii Contents
2.5 Differential Online Approximate Optimal Control . . . . ......... 28

2.5.1 Reinforcement Learning-Based Online
Implementation . . . . . . . . . . . . . . . . . . . . . . ......... 29
2.5.2 Linear-in-the-Parameters Approximation
of the Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Uncertainties in System Dynamics . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 Persistence of Excitation and Parameter Convergence . . . . . . . . . . 33
2.8 Further Reading and Historical Remarks . . . . . . . . . . . . . . . . . . . 34
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Excitation-Based Online Approximate Optimal Control . . . . . . . . . . 43
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Online Optimal Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 Identifier Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 Least-Squares Update for the Critic . . . . . . . . . . . . . . . . . 49
3.2.3 Gradient Update for the Actor . . . . . . . . . . . . . . . . . . . . 50
3.2.4 Convergence and Stability Analysis . . . . . . . . . . . . . . . . 51
3.2.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Extension to Trajectory Tracking . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.1 Formulation of a Time-Invariant Optimal Control
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.2 Approximate Optimal Solution . . . . . . . . . . . . . . . . . . . . 61
3.3.3 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 N-Player Nonzero-Sum Differential Games . . . . . . . . . . . . . . . . . . 73
3.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.2 Hamilton–Jacobi Approximation Via
Actor-Critic-Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4.3 System Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.4 Actor-Critic Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4.6 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.5 Background and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . 91
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4 Model-Based Reinforcement Learning for Approximate
Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2 Model-Based Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 101
4.3 Online Approximate Regulation . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.1 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.2 Value Function Approximation . . . . . . . . . . . . . . . . . . . . 104
4.3.3 Simulation of Experience Via Bellman Error
Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Contents xiii

4.3.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.4 Extension to Trajectory Tracking . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.4.1 Problem Formulation and Exact Solution . . . . . . . . . . . . . 118
4.4.2 Bellman Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.4.5 Simulation of Experience . . . . . . . . . . . . . . . . . . . . . . . . 122
4.4.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.5 N-Player Nonzero-Sum Differential Games . . . . . . . . . . . . . . . . . . 131
4.5.2 Model-Based Reinforcement Learning . . . . . . . . . . . . . . . 133
4.5.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5 Differential Graphical Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.2 Cooperative Formation Tracking Control of Heterogeneous
Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.2.1 Graph Theory Preliminaries . . . . . . . . . . . . . . . . . . . . . . 151
5.2.3 Elements of the Value Function . . . . . . . . . . . . . . . . . . . 153
5.2.4 Optimal Formation Tracking Problem . . . . . . . . . . . . . . . 153
5.2.6 Approximation of the Bellman Error and the Relative
Steady-State Controller . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.2.8 Simulation of Experience via Bellman Error
Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.2.10 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.3 Reinforcement Learning-Based Network Monitoring . . . . . . . . . . . 180
5.3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.3.5 Monitoring Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
xiv Contents
6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.2 Station-Keeping of a Marine Craft . . . . . . . . . . . . . . . . . . . . . . . . 196
6.2.1 Vehicle Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.2.2 System Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.2.4 Approximate Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.2.6 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.3 Online Optimal Control for Path-Following . . . . . . . . . . . . . . . . . 213
6.3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.3.2 Optimal Control and Approximate Solution . . . . . . . . . . . 215
6.3.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
6.3.5 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7 Computational Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.2 Reproducing Kernel Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . 230
7.3 StaF: A Local Approximation Method . . . . . . . . . . . . . . . . . . . . . 232
7.3.1 The StaF Problem Statement . . . . . . . . . . . . . . . . . . . . . . 232
7.3.2 Feasibility of the StaF Approximation
and the Ideal Weight Functions . . . . . . . . . . . . . . . . . . . . 233
7.3.3 Explicit Bound for the Exponential Kernel . . . . . . . . . . . 235
7.3.4 The Gradient Chase Theorem . . . . . . . . . . . . . . . . . . . . . 237
7.3.5 Simulation for the Gradient Chase Theorem . . . . . . . . . . 240
7.4 Local Approximation for Efficient Model-Based
Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.4.1 StaF Kernel Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.4.2 StaF Kernel Functions for Online Approximate
Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.4.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
7.4.4 Extension to Systems with Uncertain Drift
Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
7.4.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Appendix A: Supplementary Lemmas and Definitions . . . . . . . . . . . . . . . 265
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Symbols
Lists of abbreviations and symbols used in definitions, lemmas, theorems, and the
development in the subsequent chapters.
R Set of real numbers

R ð Þa Set of real numbers greater (less) than or equal to a
R [ ð\Þa Set of real numbers strictly greater (less) than a
Rn n-dimensional real Euclidean space
Rnm The space of n m matrices of real numbers
Cn n-dimensional complex Euclidean space
Cn ðD1 ; D2 Þ The space of n-times continuously differentiable functions with
domain D1 and codomain D2 , and the domain and the codomain
are suppressed when clear from the context
In n n Identity matrix
0nn n n Matrix of zeros
1nn n n Matrix of ones
diagfx1 ; . . .; xn g Diagonal matrix with x1 ; . . .; xn on the diagonal
2 Belongs to
8 For all
Subset of
, Equals by definition
f : D1 ! D2 A function f with domain D1 and codomain D2
! Approaches
7! Maps to
) Implies that
Convolution operator
jj Absolute value
k k Euclidean norm
qffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
k kF
Frobenius norm, khkF ¼ tr hT h
k k1 Induced infinity norm
xv
xvi Symbols
kmin Minimum eigenvalue

kmax Maximum eigenvalue
x_ ; €x; . . .; xðiÞ First, second, …, ith time derivative of x
@f ðx;y;...Þ Partial derivative of f with respect to y
@y
ry f ðx; y; . . .Þ Gradient of f with respect to y
rf ðx; y; . . .Þ Gradient of f with respect to the first argument
Br The ball x 2 Rn j k xk\r
Br ð yÞ The ball x 2 Rn j kx yk\r

A Closure of a set A
int(AÞ Interior of a set A
@ ð AÞ Boundary of a set A
1A Indicator function of a set A
L1 ð D 1 ; D 2 Þ Space of uniformly essentially bounded functions with domain
D1 and codomain D2 , and the domain and the codomain are
suppressed when clear from the context
sgnðÞ Vector and scalar signum function
trðÞ Trace of a matrix
vecðÞ Stacks the columns of a matrix to form a vector
projðÞ A smooth projection operator
½ Skew-symmetric cross product matrix
Chapter 1
Optimal Control
1.1 Introduction
The ability to learn behaviors from interactions with the environment is a desirable
characteristic of a cognitive agent. Typical interactions between an agent and its
environment can be described in terms of actions, states, and rewards (or penalties).
Actions executed by the agent affect the state of the system (i.e., the agent and the
environment), and the agent is presented with a reward (or a penalty). Assuming that
the agent chooses an action based on the state of the system, the behavior (or the
policy) of the agent can be described as a map from the state-space to the action-space.
Desired behaviors can be learned by adjusting the agent-environment interaction
through the rewards/penalties. Typically, the rewards/penalties are qualified by a cost.
For example, in many applications, the correctness of a policy is often quantified in
terms of the Lagrange cost and the Mayer cost. The Lagrange cost is the cumulative
penalty accumulated along a path traversed by the agent and the Mayer cost is the
penalty at the boundary. Policies with lower total cost are considered better and
policies that minimize the total cost are considered optimal. The problem of finding
the optimal policy that minimizes the total Lagrange and Meyer cost is known as the
Bolza optimal control problem.
1.2 Notation
Throughout the book, unless otherwise specified, the domain of all the functions is
assumed to be R≥0 . Function names corresponding to state and control trajectories are
reused to denote elements in the range of the function. For example, the notation u (·)
is used to denote the function u : R≥t0 → Rm , the notation u is used to denote an arbi-
trary element of Rm , and the notation u (t) is used to denote the value of the function
u (·) evaluated at time t. Unless otherwise specified, all the mathematical quanti-
ties are assumed to be time-varying, an equation of the form g (x) = f + h (y, t)
is interpreted as g (x (t)) = f (t) + h (y (t) , t) for all t ∈ R≥0 , and a definition of
the form g (x, y) f (y) + h (x) for functions g : A × B → C, f : B → C and
© Springer International Publishing AG 2018 1

R. Kamalapurkar et al., Reinforcement Learning for Optimal
Feedback Control, Communications and Control Engineering,
https://doi.org/10.1007/978-3-319-78384-0_1
2 1 Optimal Control
h : A → C is interpreted as g (x, y) f (y) + h (x) , ∀ (x, y) ∈ A × B. The nota-

χ
tion h denotes supξ ∈χ h (ξ ), for a continuous function h : Rn → Rk and a
compact set χ . When the compact set is clear from the context, the notation h is
utilized.
1.3 The Bolza Problem
Consider a controlled dynamical system described by the initial value problem
ẋ (t) = f (x (t) , u (t) , t) , x (t0 ) = x0 , (1.1)
where t0 is the initial time, x : R≥t0 → Rn denotes the system state and u : R≥t0 →
U ⊂ Rm denotes the control input, and U denotes the action-space.
To ensure local existence and uniqueness of Carathéodory solutions to (1.1), it is
assumed that the function f : Rn × U × R≥t0 → Rn is continuous with respect to
t and u, and continuously differentiable with respect to x. Furthermore, the control
signal, u (·), is restricted to be piecewise continuous. The assumptions stated here are
sufficient but not necessary to ensure local existence and uniqueness of Carathéodory
solutions to (1.1). For further discussion on existence and uniqueness of Carathéodory
solutions, see [1, 2]. Further restrictions on the dynamical system are stated, when
necessary, in subsequent chapters.
Consider a fixed final time optimal control problem where the optimality of a
control policy is quantified in terms of a cost functional
t f

J (t0 , x0 , u (·)) = L (x (t; t0 , x0 , u (·)) , u (t) , t) dt + Φ x f , (1.2)
t0
where L : Rn × U × R≥0 → R is the Lagrange cost, Φ : R → R is the Mayer

n
cost, and t f and x f x t f denote the final time and state, respectively. In (1.2),
the notation x (t; t0 , x0 , u (·)) is used to denote a trajectory of the system in (1.1),
evaluated at time t, under the controller u (·), starting at the initial time t0 , and with
the initial state x0 . Similarly, for a given policy φ : Rn → Rn , the short notation
x (t; t0 , x0 , φ (x (·))) is used to denote a trajectory under the feedback controller
u (t) = φ (x (t; t0 , x0 , u (·))). Throughout the book, the symbol x is also used to
denote generic initial conditions in Rn . Furthermore, when the controller, the initial
time, and the initial state are understood from the context, the shorthand x (·) is used
when referring to the entire trajectory, and the shorthand x (t) is used when referring
to the state of the system at time t.
The two most popular approaches to solve Bolza problems are Pontryagin’s max-
imum principle and dynamic programming. The two approaches are independent,
both conceptually and in terms of their historic development. Both the approaches
are developed on the foundation of calculus of variations, which has its origins in
1.3 The Bolza Problem 3
Newton’s Minimal Resistance Problem dating back to 1685 and Johann Bernoulli’s
Brachistochrone problem dating back to 1696. The maximum principle was devel-
oped by the Pontryagin school at the Steklov Institute in the 1950s [3]. The devel-
opment of dynamic programming methods was simultaneously but independently
initiated by Bellman at the RAND Corporation [4]. While Pontryagin’s maximum
principle results in optimal control methods that generate optimal state and control
trajectories starting from a specific state, dynamic programming results in methods
that generate optimal policies (i.e., they determine the optimal decision to be made
at any state of the system).
Barring some comparative remarks, the rest of this monograph will focus on the
dynamic programming approach to solve Bolza problems. The interested reader is
directed to the books by Kirk [5], Bryson and Ho [6], Liberzon [7], and Vinter [8]
for an in-depth discussion of Pontryagin’s maximum principle.
1.4 Dynamic Programming
Dynamic programming methods generalize the Bolza problem. Instead of solving

the fixed final time Bolza problem for particular values of t0 , t f , and x, a family of
Bolza problems characterized by the cost functionals
t f

J (t, x, u (·)) = L (x (τ ; t, x, u (·)) , u (τ ) , τ ) dτ + Φ x f (1.3)
t

is solved, where t ∈ t0 , t f , t f ∈ R≥0 , and x ∈ Rn . A solution to the family of Bolza
problems in (1.3) can be characterized using the optimal cost-to-go function (i.e.,
the optimal value function) V ∗ : Rn × R≥0 → R, defined as
V ∗ (x, t) inf J (t, x, u (·)) , (1.4)

u t,t
[ f]
where the notation u [t,τ ] for τ ≥ t ≥ t0 denotes the controller u (·) restricted to the
time interval [t, τ ].
1.4.1 Necessary Conditions for Optimality
In the subsequent development, a set of necessary conditions for the optimality of

the value function are developed based on Bellman’s principle of optimality.
Theorem 1.1 [7, p. 160] The value function, V ∗ , satisfies

the principle
of optimality.
That is, for all (x, t) ∈ Rn × t0 , t f , and for all Δt ∈ 0, t f − t ,
4 1 Optimal Control
⎧ t+Δt ⎫
⎨ ⎬
V ∗ (x, t) = inf L (x (τ ) , u (τ ) , τ ) dτ + V ∗ (x (t + Δt) , t + Δt) .
u [t,t+Δt] ⎩ ⎭
t
(1.5)

Proof Consider the function V : Rn × t0 , t f → R defined as
⎧ t+Δt ⎫
⎨ ⎬
V (x, t) inf L (x (τ ) , u (τ ) , τ ) dτ + V ∗ (x (t + Δt) , t + Δt) .
u [t,t+Δt] ⎩ ⎭
t
Based on the definition in (1.4)

t+Δt
V (x, t) = inf L (x (τ ) , u (τ ) , τ ) dτ + inf J (t + Δt, x (t + Δt) , u (·)) .
u [t,t+Δt] u
t+Δt,t f
t
Using (1.3) and combining the integrals,

V (x, t) = inf J (t, x, u (·)) ≥ inf J (t, x, u (·)) = V ∗ (x, t) .
inf
u [t,t+Δt] u u t,t
[ ] t+Δt,t f [ f]
(1.6)
Thus, V (x, t) ≥ V ∗ (x, t). On the other hand, by the definition of the infimum, for
all > 0, there exists a controller u (·) such that
V ∗ (x, t) + ≥ J (t, x, u (·)) .
Let x denote the trajectory corresponding to u . Then,

t+Δt
J (t, x, u ) = L (x (τ ) , u (τ ) , τ ) dτ + J (t + Δt, x (t + Δt) , u ) ,

t

t+Δt
≥ L (x (τ ) , u (τ ) , τ ) dτ + V (x (t + Δt) , t + Δt) ≥ V (x, t) .

t
Thus, V (x, t) ≤ V ∗ (x, t), which, along with (1.6), implies V (x, t) = V ∗ (x, t).

Under the assumption that V ∗ ∈ C 1 Rn × t0 , t f , R , the optimal value function
can be shown to satisfy

0 = −∇t V ∗ (x, t) − inf L (x, u, t) + ∇x V ∗T (x, t) f (x, u, t) ,
u∈U
Another random document with
no related content on Scribd:
PART IV.
PHYSICS,
OR THE PHENOMENA OF NATURE.
CHAPTER XXV.
OF SENSE AND ANIMAL MOTION.
1. The connexion of what hath been said with that which followeth.—2. The
investigation of the nature of sense, and the definition of sense.—3. The
subject and object of sense.—4. The organs of sense.—5. All bodies are not
indued with sense.—6. But one phantasm at one and the same time.—7.
Imagination the remains of past sense, which also is memory. Of sleep.—8.
How phantasms succeed one another.—9. Dreams, whence they proceed.—
10. Of the senses, their kinds, their organs, and phantasms proper and
common.—11. The magnitude of images, how and by what it is determined.—
12. Pleasure, pain, appetite and aversion, what they are.—13. Deliberation
and will, what.
The 1. I have, in the first chapter, defined philosophy

connexion of to be knowledge of effects acquired by true
what hath ratiocination, from knowledge first had of their
been said
with that causes and generation; and of such causes or
which generations as may be, from former knowledge of
followeth. their effects or appearances. There are, therefore,
two methods of philosophy; one, from the
generation of things to their possible effects; and the other, from
their effects or appearances to some possible generation of the
same. In the former of these the truth of the first principles of our
ratiocination, namely definitions, is made and constituted by
ourselves, whilst we consent and agree about the appellations of
things. And this part I have finished in the foregoing chapters; in
which, if I am not deceived, I have affirmed nothing, saving the
definitions themselves, which hath not good coherence with the
definitions I have given; that is to say, which is not sufficiently
demonstrated to all those, that agree with me in the use of words
and appellations; for whose sake only I have written the same. I
now enter upon the other part; which is the finding out by the
appearances or effects of nature, which we know by sense, some
ways and means by which they may be, I do not say they are,
generated. The principles, therefore, upon which the following
discourse depends, are not such as we ourselves make and
pronounce in general terms, as definitions; but such, as being placed
in the things themselves by the Author of Nature, are by us
observed in them; and we make use of them in single and particular,
not universal propositions. Nor do they impose upon us any
necessity of constituting theorems; their use being only, though not
without such general propositions as have been already
demonstrated, to show us the possibility of some production or
generation. Seeing, therefore, the science, which is here taught,
hath its principles in the appearances of nature, and endeth in the
attaining of some knowledge of natural causes, I have given to this
part the title of Physics, or the Phenomena of Nature. Now such
things as appear, or are shown to us by nature, we call phenomena
or appearances.
Of all the phenomena or appearances which are near us, the most
admirable is apparition itself, τὸ φαίνεσθαι; namely, that some
natural bodies have in themselves the patterns almost of all things,
and others of none at all. So that if the appearances be the
principles by which we know all other things, we must needs
acknowledge sense to be the principle by which we know those
principles, and that all the knowledge we have is derived from it.
And as for the causes of sense, we cannot begin our search of them
from any other phenomenon than that of sense itself. But you will
say, by what sense shall we take notice of sense? I answer, by sense
itself, namely, by the memory which for some time remains in us of
things sensible, though they themselves pass away. For he that
perceives that he hath perceived, remembers.
In the first place, therefore, the causes of our perception, that is,
the causes of those ideas and phantasms which are perpetually
generated within us whilst we make use of our senses, are to be
enquired into; and in what manner their generation proceeds. To
help which inquisition, we may observe first of all, that our
phantasms or ideas are not always the same; but that new ones
appear to us, and old ones vanish, according as we apply our organs
of sense, now to one object, now to another. Wherefore they are
generated, and perish. And from hence it is manifest, that they are
some change or mutation in the sentient.
The 2. Now that all mutation or alteration is motion or
investigation endeavour (and endeavour also is motion) in the
of the nature internal parts of the thing that is altered, hath been
of sense, and
the definition proved (in art. 9, chap. VIII) from this, that whilst
of sense. even the least parts of any body remain in the same
situation in respect of one another, it cannot be said
that any alteration, unless perhaps that the whole body together
hath been moved, hath happened to it; but that it both appeareth
and is the same it appeared and was before. Sense, therefore, in the
sentient, can be nothing else but motion in some of the internal
parts of the sentient; and the parts so moved are parts of the organs
of sense. For the parts of our body, by which we perceive any thing,
are those we commonly call the organs of sense. And so we find
what is the subject of our sense, namely, that in which are the
phantasms; and partly also we have discovered the nature of sense,
namely, that it is some internal motion in the sentient.
I have shown besides (in chap. IX, art. 7) that no motion is
generated but by a body contiguous and moved: from whence it is
manifest, that the immediate cause of sense or perception consists
in this, that the first organ of sense is touched and pressed. For
when the uttermost part of the organ is pressed, it no sooner yields,
but the part next within it is pressed also; and, in this manner, the
pressure or motion is propagated through all the parts of the organ
to the innermost. And thus also the pressure of the uttermost part
proceeds from the pressure of some more remote body, and so
continually, till we come to that from which, as from its fountain, we
derive the phantasm or idea that is made in us by our sense. And
this, whatsoever it be, is that we commonly call the object, Sense,
therefore, is some internal motion in the sentient, generated by
some internal motion of the parts of the object, and propagated
through all the media to the innermost part of the organ. By which
words I have almost defined what sense is.
Moreover, I have shown (art. 2, chap. XV) that all resistance is
endeavour opposite to another endeavour, that is to say, reaction.
Seeing, therefore, there is in the whole organ, by reason of its own
internal natural motion, some resistance or reaction against the
motion which is propagated from the object to the innermost part of
the organ, there is also in the same organ an endeavour opposite to
the endeavour which proceeds from the object; so that when that
endeavour inwards is the last action in the act of sense, then from
the reaction, how little soever the duration of it be, a phantasm or
idea hath its being; which, by reason that the endeavour is now
outwards, doth always appear as something situate without the
organ. So that now I shall give you the whole definition of sense, as
it is drawn from the explication of the causes thereof and the order
of its generation, thus: SENSE is a phantasm, made by the reaction
and endeavour outwards in the organ of sense, caused by an
endeavour inwards from the object, remaining for some time more
or less.
The subject 3. The subject of sense is the sentient itself,
and object of namely, some living creature; and we speak more
sense. correctly, when we say a living creature seeth, than
when we say the eye seeth. The object is the thing received; and it
is more accurately said, that we see the sun, than that we see the
light. For light and colour, and heat and sound, and other qualities
which are commonly called sensible, are not objects, but phantasms
in the sentients. For a phantasm is the act of sense, and differs no
otherwise from sense than fieri, that is, being a doing, differs from
factum esse, that is, being done; which difference, in things that are
done in an instant, is none at all; and a phantasm is made in an
instant. For in all motion which proceeds by perpetual propagation,
the first part being moved moves the second, the second the third,
and so on to the last, and that to any distance, how great soever.
And in what point of time the first or foremost part proceeded to the
place of the second, which is thrust on, in the same point of time the
last save one proceeded into the place of the last yielding part;
which by reaction, in the same instant, if the reaction be strong
enough, makes a phantasm; and a phantasm being made,
perception is made together with it.
The organs of 4. The organs of sense, which are in the sentient,
sense. are such parts thereof, that if they be hurt, the very
generation of phantasms is thereby destroyed, though all the rest of
the parts remain entire. Now these parts in the most of living
creatures are found to be certain spirits and membranes, which,
proceeding from the pia mater, involve the brain and all the nerves;
also the brain itself, and the arteries which are in the brain; and such
other parts, as being stirred, the heart also, which is the fountain of
all sense, is stirred together with them. For whensoever the action of
the object reacheth the body of the sentient, that action is by some
nerve propagated to the brain; and if the nerve leading thither be so
hurt or obstructed, that the motion can be propagated no further, no
sense follows. Also if the motion be intercepted between the brain
and the heart by the defect of the organ by which the action is
propagated, there will be no perception of the object.
All bodies are 5. But though all sense, as I have said, be made
not endued by reaction, nevertheless it is not necessary that
with sense. every thing that reacteth should have sense. I know
there have been philosophers, and those learned men, who have
maintained that all bodies are endued with sense. Nor do I see how
they can be refuted, if the nature of sense be placed in reaction
only. And, though by the reaction of bodies inanimate a phantasm
might be made, it would nevertheless cease, as soon as ever the
object were removed. For unless those bodies had organs, as living
creatures have, fit for the retaining of such motion as is made in
them, their sense would be such, as that they should never
remember the same. And therefore this hath nothing to do with that
sense which is the subject of my discourse. For by sense, we
commonly understand the judgment we make of objects by their
phantasms; namely, by comparing and distinguishing those
phantasms; which we could never do, if that motion in the organ, by
which the phantasm is made, did not remain there for some time,
and make the same phantasm return. Wherefore sense, as I here
understand it, and which is commonly so called, hath necessarily
some memory adhering to it, by which former and later phantasms
may be compared together, and distinguished from one another.
Sense, therefore, properly so called, must necessarily have in it a
perpetual variety of phantasms, that they may be discerned one
from another. For if we should suppose a man to be made with clear
eyes, and all the rest of his organs of sight well disposed, but
endued with no other sense; and that he should look only upon one
thing, which is always of the same colour and figure, without the
least appearance of variety, he would seem to me, whatsoever
others may say, to see, no more than I seem to myself to feel the
bones of my own limbs by my organs of feeling; and yet those
bones are always and on all sides touched by a most sensible
membrane. I might perhaps say he were astonished, and looked
upon it; but I should not say he saw it; it being almost all one for a
man to be always sensible of one and the same thing, and not to be
sensible at all of any thing.
But one 6. And yet such is the nature of sense, that it
phantasm at does not permit a man to discern many things at
one and the once. For seeing the nature of sense consists in
same time.
motion; as long as the organs are employed about
one object, they cannot be so moved by another at the same time,
as to make by both their motions one sincere phantasm of each of
them at once. And therefore two several phantasms will not be
made by two objects working together, but only one phantasm
compounded from the action of both.
Besides, as when we divide a body, we divide its place; and when
we reckon many bodies, we must necessarily reckon as many
places; and contrarily, as I have shown in the seventh chapter; so
what number soever we say there be of times, we must understand
the same number of motions also; and as oft as we count many
motions, so oft we reckon many times. For though the object we
look upon be of divers colours, yet with those divers colours it is but
one varied object, and not variety of objects.
Moreover, whilst those organs which are common to all the
senses, such as are those parts of every organ which proceed in
men from the root of the nerves to the heart, are vehemently stirred
by a strong action from some one object, they are, by reason of the
contumacy which the motion, they have already, gives them against
the reception of all other motion, made the less fit to receive any
other impression from whatsoever other objects, to what sense
soever those objects belong. And hence it is, that an earnest
studying of one object, takes away the sense of all other objects for
the present. For study is nothing else but a possession of the mind,
that is to say, a vehement motion made by some one object in the
organs of sense, which are stupid to all other motions as long as this
lasteth; according to what was said by Terence, “Populus studio
stupidus in funambulo animum occuparat.” For what is stupor but
that which the Greeks call ἀναισθησία, that is, a cessation from the
sense of other things? Wherefore at one and the same time, we
cannot by sense perceive more than one single object; as in reading,
we see the letters successively one by one, and not all together,
though the whole page be presented to our eye; and though every
several letter be distinctly written there, yet when we look upon the
whole page at once, we read nothing.
From hence it is manifest, that every endeavour of the organ
outwards, is not to be called sense, but that only, which at several
times is by vehemence made stronger and more predominant than
the rest; which deprives us of the sense of other phantasms, no
otherwise than the sun deprives the rest of the stars of light, not by
hindering their action, but by obscuring and hiding them with his
excess of brightness.
Imagination, 7. But the motion of the organ, by which a
the remains phantasm is made, is not commonly called sense,
of past sense; except the object be present. And the phantasm
which also is
memory. Of remaining after the object is removed or past by, is
sleep. called fancy, and in Latin imaginatio; which word,
because all phantasms are not images, doth not
fully answer the signification of the word fancy in its general
acceptation. Nevertheless I may use it safely enough, by
understanding it for the Greek Φαντασία.
Imagination therefore is nothing else but sense decaying, or
weakened, by the absence of the object. But what may be the cause
of this decay or weakening? Is the motion the weaker, because the
object is taken away? If it were, then phantasms would always and
necessarily be less clear in the imagination, than they are in sense;
which is not true. For in dreams, which are the imaginations of those
that sleep, they are no less clear than in sense itself. But the reason
why in men waking the phantasms of things past are more obscure
than those of things present, is this, that their organs being at the
same time moved by other present objects, those phantasms are the
less predominant. Whereas in sleep, the passages being shut up,
external action doth not at all disturb or hinder internal motion.
If this be true, the next thing to be considered, will be, whether
any cause may be found out, from the supposition whereof it will
follow, that the passage is shut up from the external objects of
sense to the internal organ. I suppose, therefore, that by the
continual action of objects, to which a reaction of the organ, and
more especially of the spirits, is necessarily consequent, the organ is
wearied, that is, its parts are no longer moved by the spirits without
some pain; and consequently the nerves being abandoned and
grown slack, they retire to their fountain, which is the cavity either
of the brain or of the heart; by which means the action which
proceeded by the nerves is necessarily intercepted. For action upon
a patient, that retires from it, makes but little impression at the first;
and at last, when the nerves are by little and little slackened, none
at all. And therefore there is no more reaction, that is, no more
sense, till the organ being refreshed by rest, and by a supply of new
spirits recovering strength and motion, the sentient awaketh. And
thus it seems to be always, unless some other preternatural cause
intervene; as heat in the internal parts from lassitude, or from some
disease stirring the spirits and other parts of the organ in some
extraordinary manner.
How 8. Now it is not without cause, nor so casual a
phantasms thing as many perhaps think it, that phantasms in
succeed one this their great variety proceed from one another;
another.
and that the same phantasms sometimes bring into
the mind other phantasms like themselves, and at other times
extremely unlike. For in the motion of any continued body, one part
follows another by cohesion; and therefore, whilst we turn our eyes
and other organs successively to many objects, the motion which
was made by every one of them remaining, the phantasms are
renewed as often as any one of those motions comes to be
predominant above the rest; and they become predominant in the
same order in which at any time formerly they were generated by
sense. So that when by length of time very many phantasms have
been generated within us by sense, then almost any thought may
arise from any other thought; insomuch that it may seem to be a
thing indifferent and casual, which thought shall follow which. But
for the most part this is not so uncertain a thing to waking as to
sleeping men. For the thought or phantasm of the desired end
brings in all the phantasms, that are means conducing to that end,
and that in order backwards from the last to the first, and again
forwards from the beginning to the end. But this supposes both
appetite, and judgment to discern what means conduce to the end,
which is gotten by experience; and experience is store of
phantasms, arising from the sense of very many things. For
φανταζεσθαι and meminisse, fancy and memory, differ only in this,
that memory supposeth the time past, which fancy doth not. In
memory, the phantasms we consider are as if they were worn out
with time; but in our fancy we consider them as they are; which
distinction is not of the things themselves, but of the considerations
of the sentient. For there is in memory something like that which
happens in looking upon things at a great distance; in which as the
small parts of the object are not discerned, by reason of their
remoteness; so in memory, many accidents and places and parts of
things, which were formerly perceived by sense, are by length of
time decayed and lost.
The perpetual arising of phantasms, both in sense and
imagination, is that which we commonly call discourse of the mind,
and is common to men with other living creatures. For he that
thinketh, compareth the phantasms that pass, that is, taketh notice
of their likeness or unlikeness to one another. And as he that
observes readily the likenesses of things of different natures, or that
are very remote from one another, is said to have a good fancy; so
he is said to have a good judgment, that finds out the unlikenesses
or differences of things that are like one another. Now this
observation of differences is not perception made by a common
organ of sense, distinct from sense or perception properly so called,
but is memory of the differences of particular phantasms remaining
for some time; as the distinction between hot and lucid, is nothing
else but the memory both of a heating, and of an enlightening
object.
Dreams, 9. The phantasms of men that sleep, are dreams.
whence they Concerning which we are taught by experience
proceed. these five things. First, that for the most part there
is neither order nor coherence in them. Secondly, that we dream of
nothing but what is compounded and made up of the phantasms of
sense past. Thirdly, that sometimes they proceed, as in those that
are drowsy, from the interruption of their phantasms by little and
little, broken and altered through sleepiness; and sometimes also
they begin in the midst of sleep. Fourthly, that they are clearer than
the imaginations of waking men, except such as are made by sense
itself, to which they are equal in clearness. Fifthly, that when we
dream, we admire neither the places nor the looks of the things that
appear to us. Now from what hath been said, it is not hard to show
what may be the causes of these phenomena. For as for the first,
seeing all order and coherence proceeds from frequent looking back
to the end, that is, from consultation; it must needs be, that seeing
in sleep we lose all thought of the end, our phantasms succeed one
another, not in that order which tends to any end, but as it
happeneth, and in such manner, as objects present themselves to
our eyes when we look indifferently upon all things before us, and
see them, not because we would see them, but because we do not
shut our eyes; for then they appear to us without any order at all.
The second proceeds from this, that in the silence of sense there is
no new motion from the objects, and therefore no new phantasm,
unless we call that new, which is compounded of old ones, as a
chimera, a golden mountain, and the like. As for the third, why a
dream is sometimes as it were the continuation of sense, made up
of broken phantasms, as in men distempered with sickness, the
reason is manifestly this, that in some of the organs sense remains,
and in others it faileth. But how some phantasms may be revived,
when all the exterior organs are benumbed with sleep, is not so
easily shown. Nevertheless that, which hath already been said,
contains the reason of this also. For whatsoever strikes the pia
mater, reviveth some of those phantasms that are still in motion in
the brain; and when any internal motion of the heart reacheth that
membrane, then the predominant motion in the brain makes the
phantasm. Now the motions of the heart are appetites and
aversions, of which I shall presently speak further. And as appetites
and aversions are generated by phantasms, so reciprocally
phantasms are generated by appetites and aversions. For example,
heat in the heart proceeds from anger and fighting; and again, from
heat in the heart, whatsoever be the cause of it, is generated anger
and the image of an enemy, in sleep. And as love and beauty stir up
heat in certain organs; so heat in the same organs, from whatsoever
it proceeds, often causeth desire and the image of an unresisting
beauty. Lastly, cold doth in the same manner generate fear in those
that sleep, and causeth them to dream of ghosts, and to have
phantasms of horror and danger; as fear also causeth cold in those
that wake. So reciprocal are the motions of the heart and brain. The
fourth, namely, that the things we seem to see and feel in sleep, are
as clear as in sense itself, proceeds from two causes; one, that
having then no sense of things without us, that internal motion
which makes the phantasm, in the absence of all other impressions,
is predominant; and the other, that the parts of our phantasms
which are decayed and worn out by time, are made up with other
fictitious parts. To conclude, when we dream, we do not wonder at
strange places and the appearances of things unknown to us,
because admiration requires that the things appearing be new and
unusual, which can happen to none but those that remember former
appearances; whereas in sleep, all things appear as present.
But it is here to be observed, that certain dreams, especially such
as some men have when they are between sleeping and waking,
and such as happen to those that have no knowledge of the nature
of dreams and are withal superstitious, were not heretofore nor are
now accounted dreams. For the apparitions men thought they saw,
and the voices they thought they heard in sleep, were not believed
to be phantasms, but things subsisting of themselves, and objects
without those that dreamed. For to some men, as well sleeping as
waking, but especially to guilty men, and in the night, and in
hallowed places, fear alone, helped a little with the stories of such
apparitions, hath raised in their minds terrible phantasms, which
have been and are still deceitfully received for things really true,
under the names of ghosts and incorporeal substances.
Of the 10. In most living creatures there are observed
senses, their five kinds of senses, which are distinguished by their
kinds, their organs, and by their different kinds of phantasms;
organs and
phantasms, namely, sight, hearing, smell, taste, and touch; and
proper and these have their organs partly peculiar to each of
common. them severally, and partly common to them all. The
organ of sight is partly animate, and partly
inanimate. The inanimate parts are the three humours; namely, the
watery humour, which by the interposition of the membrane called
uvea, the perforation whereof is called the apple of the eye, is
contained on one side by the first concave superficies of the eye,
and on the other side by the ciliary processes, and the coat of the
crystalline humour; the crystalline, which, hanging in the midst
between the ciliary processes, and being almost of spherical figure,
and of a thick consistence, is enclosed on all sides with its own
transparent coat; and the vitreous or glassy humour, which filleth all
the rest of the cavity of the eye, and is somewhat thicker then the
watery humour, but thinner than the crystalline. The animate part of
the organ is, first, the membrane choroeides, which is a part of the
pia mater, saving that it is covered with a coat derived from the
marrow of the optic nerve, which is called the retina; and this
choroeides, seeing it is part of the pia mater, is continued to the
beginning of the medulla spinalis within the scull, in which all the
nerves which are within the head have their roots. Wherefore all the
animal spirits that the nerves receive, enter into them there; for it is
not imaginable that they can enter into them anywhere else. Seeing
therefore sense is nothing else but the action of objects propagated
to the furthest part of the organ; and seeing also that animal spirits
are nothing but vital spirits purified by the heart, and carried from it
by the arteries; it follows necessarily, that the action is derived from
the heart by some of the arteries to the roots of the nerves which
are in the head, whether those arteries be the plexus retiformis, or
whether they be other arteries which are inserted into the substance
of the brain. And, therefore, those arteries are the complement or
the remaining part of the whole organ of sight. And this last part is a
common organ to all the senses; whereas, that which reacheth from
the eye to the roots of the nerves is proper only to sight. The proper
organ of hearing is the tympanum of the ear and its own nerve;
from which to the heart the organ is common. So the proper organs
of smell and taste are nervous membranes, in the palate and tongue
for the taste, and in the nostrils for the smell; and from the roots of
those nerves to the heart all is common. Lastly, the proper organ of
touch are nerves and membranes dispersed through the whole
body; which membranes are derived from the root of the nerves.
And all things else belonging alike to all the senses seem to be
administered by the arteries, and not by the nerves.
The proper phantasm of sight is light; and under this name of
light, colour also, which is nothing but perturbed light, is
comprehended. Wherefore the phantasm of a lucid body is light; and
of a coloured body, colour. But the object of sight, properly so called,
is neither light nor colour, but the body itself which is lucid, or
enlightened, or coloured. For light and colour, being phantasms of
the sentient, cannot be accidents of the object. Which is manifest
enough from this, that visible things appear oftentimes in places in
which we know assuredly they are not, and that in different places
they are of different colours, and may at one and the same time
appear in divers places. Motion, rest, magnitude, and figure, are
common both to the sight and touch; and the whole appearance
together of figure, and light or colour, is by the Greeks commonly
called εἴδος, and εἴδωλον, and ἱδέα; and by the Latins, species and
imago; all which names signify no more but appearance.
The phantasm, which is made by hearing, is sound; by smell,
odour; by taste, savour; and by touch, hardness and softness, heat
and cold, wetness, oiliness, and many more, which are easier to be
distinguished by sense than words. Smoothness, roughness, rarity,
and density, refer to figure, and are therefore common both to touch
and sight. And as for the objects of hearing, smell, taste, and touch,
they are not sound, odour, savour, hardness, &c., but the bodies
themselves from which sound, odour, savour, hardness, &c. proceed;
of the causes of which, and of the manner how they are produced, I
shall speak hereafter.
But these phantasms, though they be effects in the sentient, as
subject, produced by objects working upon the organs; yet there are
also other effects besides these, produced by the same objects in
the same organs; namely certain motions proceeding from sense,
which are called animal motions. For seeing in all sense of external
things there is mutual action and reaction, that is, two endeavours
opposing one another, it is manifest that the motion of both of them
together will be continued every way, especially to the confines of
both the bodies. And when this happens in the internal organ, the
endeavour outwards will proceed in a solid angle, which will be
greater, and consequently the idea greater, than it would have been
if the impression had been weaker.
The 11. From hence the natural cause is manifest,
magnitude of first, why those things seem to be greater, which,
images, how cæteris paribus, are seen in a greater angle:
and by what
it is secondly, why in a serene cold night, when the
determined. moon doth not shine, more of the fixed stars appear
than at another time. For their action is less
hindered by the serenity of the air, and not obscured by the greater
light of the moon, which is then absent; and the cold, making the air
more pressing, helpeth or strengtheneth the action of the stars upon
our eyes; in so much as stars may then be seen which are seen at
no other time. And this may suffice to be said in general concerning
sense made by the reaction of the organ. For, as for the place of the
image, the deceptions of sight, and other things of which we have
experience in ourselves by sense, seeing they depend for the most
part upon the fabric itself of the eye of man, I shall speak of them
then when I come to speak of man.
Pleasure, 12. But there is another kind of sense, of which I
pain, will say something in this place, namely, the sense
appetite, and of pleasure and pain, proceeding not from the
aversion,
what they reaction of the heart outwards, but from continual
are. action from the outermost part of the organ towards
the heart. For the original of life being in the heart,
that motion in the sentient, which is propagated to the heart, must
necessarily make some alteration or diversion of vital motion,
namely, by quickening or slackening, helping or hindering the same.
Now when it helpeth, it is pleasure; and when it hindereth, it is pain,
trouble, grief, &c. And as phantasms seem to be without, by reason
of the endeavour outwards, so pleasure and pain, by reason of the
endeavour of the organ inwards, seem to be within; namely, there
where the first cause of the pleasure or pain is; as when the pain
proceeds from a wound, we think the pain and the wound are both
in the same place.
Pleasure, Now vital motion is the motion of the blood,
pain, perpetually circulating (as hath been shown from
appetite, and many infallible signs and marks by Doctor Harvey,
aversion,
what they the first observer of it) in the veins and arteries.
are. Which motion, when it is hindered by some other
motion made by the action of sensible objects, may
be restored again either by bending or setting strait the parts of the
body; which is done when the spirits are carried now into these, now
into other nerves, till the pain, as far as is possible, be quite taken
away. But if vital motion be helped by motion made by sense, then
the parts of the organ will be disposed to guide the spirits in such
manner as conduceth most to the preservation and augmentation of
that motion, by the help of the nerves. And in animal motion this is
the very first endeavour, and found even in the embryo; which while
it is in the womb, moveth its limbs with voluntary motion, for the
avoiding of whatsoever troubleth it, or for the pursuing of what
pleaseth it. And this first endeavour, when it tends towards such
things as are known by experience to be pleasant, is called appetite,
that is, an approaching; and when it shuns what is troublesome,
aversion, or flying from it. And little infants, at the beginning and as
soon as they are born, have appetite to very few things, as also they
avoid very few, by reason of their want of experience and memory;
and therefore they have not so great a variety of animal motion as
we see in those that are more grown. For it is not possible, without
such knowledge as is derived from sense, that is, without experience
and memory, to know what will prove pleasant or hurtful; only there
is some place for conjecture from the looks or aspects of things. And
hence it is, that though they do not know what may do them good
or harm, yet sometimes they approach and sometimes retire from
the same thing, as their doubt prompts them. But afterwards, by
accustoming themselves by little and little, they come to know
readily what is to be pursued and what to be avoided; and also to
have a ready use of their nerves and other organs, in the pursuing
and avoiding of good and bad. Wherefore appetite and aversion are
the first endeavours of animal motion.
Consequent to this first endeavour, is the impulsion into the nerves
and retraction again of animal spirits, of which it is necessary there
be some receptacle or place near the original of the nerves; and this
motion or endeavour is followed by a swelling and relaxation of the
muscles; and lastly, these are followed by contraction and extension
of the limbs, which is animal motion.
Deliberation 13. The considerations of appetites and aversions
and will, are divers. For seeing living creatures have
what. sometimes appetite and sometimes aversion to the
same thing, as they think it will either be for their good or their hurt;
while that vicissitude of appetites and aversions remains in them,
they have that series of thoughts which is called deliberation; which
lasteth as long as they have it in their power to obtain that which
pleaseth, or to avoid that which displeaseth them. Appetite,
therefore, and aversion are simply so called as long as they follow
not deliberation. But if deliberation have gone before, then the last
act of it, if it be appetite, is called will; if aversion, unwillingness. So
that the same thing is called both will and appetite; but the
consideration of them, namely, before and after deliberation, is
divers. Nor is that which is done within a man whilst he willeth any
thing, different from that which is done in other living creatures,
whilst, deliberation having preceded, they have appetite.
Neither is the freedom of willing or not willing, greater in man,
than in other living creatures. For where there is appetite, the entire
cause of appetite hath preceded; and, consequently, the act of
appetite could not choose but follow, that is, hath of necessity
followed (as is shown in chapter IX, article 5). And therefore such a
liberty as is free from necessity, is not to be found in the will either
of men or beasts. But if by liberty we understand the faculty or
power, not of willing, but of doing what they will, then certainly that
liberty is to be allowed to both, and both may equally have it,
whensoever it is to be had.
Again, when appetite and aversion do with celerity succeed one
another, the whole series made by them hath its name sometimes
from one, sometimes from the other. For the same deliberation,
whilst it inclines sometimes to one, sometimes to the other, is from
appetite called hope, and from aversion, fear. For where there is no
hope, it is not to be called fear, but hate; and where no fear, not
hope, but desire. To conclude, all the passions, called passions of the
mind, consist of appetite and aversion, except pure pleasure and
pain, which are a certain fruition of good or evil; as anger is aversion
from some imminent evil, but such as is joined with appetite of
avoiding that evil by force. But because the passions and
perturbations of the mind are innumerable, and many of them not to
be discerned in any creatures besides men; I will speak of them
more at large in that section which is concerning man. As for those
objects, if there be any such, which do not at all stir the mind, we
are said to contemn them.
And thus much of sense in general. In the next place I shall speak
of sensible objects.
CHAPTER XXVI.
OF THE WORLD AND OF THE STARS.

1. The magnitude and duration of the world, inscrutable.—2. No place in the world
empty.—3. The arguments of Lucretius for vacuum, invalid.—4. Other
arguments for the establishing of vacuum, invalid.—5. Six suppositions for the
salving of the phenomena of nature.—6. Possible causes of the motions
annual and diurnal; and of the apparent direction, station, and retrogradation
of the planets.—7. The supposition of simple motion, why likely.—8. The cause
of the eccentricity of the annual motion of the earth.—9. The cause why the
moon hath always one and the same face turned towards the earth.—10. The
cause of the tides of the ocean.—11. The cause of the precession of the
equinoxes.
The 1. Consequent to the contemplation of sense is

magnitude the contemplation of bodies, which are the efficient
and duration causes or objects of sense. Now every object is
of the world,
inscrutable. either a part of the whole world, or an aggregate of
parts. The greatest of all bodies, or sensible objects,
is the world itself; which we behold when we look round about us
from this point of the same which we call the earth. Concerning the
world, as it is one aggregate of many parts, the things that fall
under inquiry are but few; and those we can determine, none. Of
the whole world we may inquire what is its magnitude, what its
duration, and how many there be, but nothing else. For as for place
and time, that is to say, magnitude and duration, they are only our
own fancy of a body simply so called, that is to say, of a body
indefinitely taken, as I have shown before in chapter VII. All other
phantasms are of bodies or objects, as they are distinguished from
one another; as colour, the phantasm of coloured bodies; sound, of
bodies that move the sense of hearing, &c. The questions
concerning the magnitude of the world are whether it be finite or
infinite, full or not full; concerning its duration, whether it had a
beginning, or be eternal; and concerning the number, whether there
be one or many; though as concerning the number, if it were of
infinite magnitude, there could be no controversy at all. Also if it had
a beginning, then by what cause and of what matter it was made;
and again, from whence that cause and that matter had their being,
will be new questions; till at last we come to one or many eternal
cause or causes. And the determination of all these things belongeth
to him that professeth the universal doctrine of philosophy, in case
as much could be known as can be sought. But the knowledge of
what is infinite can never be attained by a finite inquirer. Whatsoever
we know that are men, we learn it from our phantasms; and of
infinite, whether magnitude or time, there is no phantasm at all; so
that it is impossible either for a man or any other creature to have
any conception of infinite. And though a man may from some effect
proceed to the immediate cause thereof, and from that to a more
remote cause, and so ascend continually by right ratiocination from
cause to cause; yet he will not be able to proceed eternally, but
wearied will at last give over, without knowing whether it were
possible for him to proceed to an end or not. But whether we
suppose the world to be finite or infinite, no absurdity will follow. For
the same things which now appear, might appear, whether the
Creator had pleased it should be finite or infinite. Besides, though
from this, that nothing can move itself, it may rightly be inferred that
there was some first eternal movent; yet it can never be inferred,
though some used to make such inference, that that movent was
eternally immoveable, but rather eternally moved. For as it is true,
that nothing is moved by itself; so it is true also that nothing is
moved but by that which is already moved. The questions therefore
about the magnitude and beginning of the world, are not to be
determined by philosophers, but by those that are lawfully
authorized to order the worship of God. For as Almighty God, when
he had brought his people into Judæa, allowed the priests the first
fruits reserved to himself; so when he had delivered up the world to
the disputations of men, it was his pleasure that all opinions
concerning the nature of infinite and eternal, known only to himself,
should, as the first fruits of wisdom, be judged by those whose
ministry he meant to use in the ordering of religion. I cannot
therefore commend those that boast they have demonstrated, by
reasons drawn from natural things, that the world had a beginning.
They are contemned by idiots, because they understand them not;
and by the learned, because they understand them; by both
deservedly. For who can commend him that demonstrates thus? "If
the world be eternal, then an infinite number of days, or other
measures of time, preceded the birth of Abraham. But the birth of
Abraham preceded the birth of Isaac; and therefore one infinite is
greater than another infinite, or one eternal than another eternal;
which," he says, "is absurd." This demonstration is like his, who from
this, that the number of even numbers is infinite, would conclude
that there are as many even numbers as there are numbers simply,
that is to say, the even numbers are as many as all the even and
odd together. They, which in this manner take away eternity from
the world, do they not by the same means take away eternity from
the Creator of the world? From this absurdity therefore they run into
another, being forced to call eternity nunc stans, a standing still of
the present time, or an abiding now; and, which is much more
absurd, to give to the infinite number of numbers the name of unity.
But why should eternity be called an abiding now, rather than an
abiding then? Wherefore there must either be many eternities, or
now and then must signify the same. With such demonstrators as
these, that speak in another language, it is impossible to enter into
disputation. And the men, that reason thus absurdly, are not idiots,
but, which makes the absurdity unpardonable, geometricians, and
such as take upon them to be judges, impertinent, but severe
judges of other men's demonstrations. The reason is this, that as
soon as they are entangled in the words infinite and eternal, of
which we have in our mind no idea, but that of our own insufficiency
to comprehend them, they are forced either to speak something
absurd, or, which they love worse, to hold their peace. For geometry
hath in it somewhat like wine, which, when new, is windy; but
afterwards though less pleasant, yet more wholesome. Whatsoever
therefore is true, young geometricians think demonstrable; but elder
not. Wherefore I purposely pass over the questions of infinite and
eternal; contenting myself with that doctrine concerning the
beginning and magnitude of the world, which I have been
persuaded to by the holy Scriptures and fame of the miracles which
confirm them; and by the custom of my country, and reverence due
to the laws. And so I pass on to such things as it is not unlawful to
dispute of.
No place in 2. Concerning the world it is further questioned,
the world whether the parts thereof be contiguous to one
empty. another, in such manner as not to admit of the least
empty space between; and the disputation both for and against it is
carried on with probability enough. For the taking away of vacuum, I
will instance in only one experiment, a common one, but I think
unanswerable.
Let A B (in fig. 1) represent a vessel, such as gardeners use to
water their gardens withal; whose bottom B is full of little holes; and
whose mouth A may be stopped with one's finger, when there shall
be need. If now this vessel be filled with water, the hole at the top A
being stopped, the water will not flow out at any of the holes in the
bottom B. But if the finger be removed to let in the air above, it will
run out at them all; and as; soon as the finger is applied to it again,
the water will suddenly and totally be stayed again from running out.
The cause whereof seems to be no other but this, that the water
cannot by its natural endeavour to descend drive down the air below
it, because there is no place for it to go into, unless either by
thrusting away the next contiguous air, it proceed by continual
endeavour to the hole A, where it may enter and succeed into the
place of the water that floweth out, or else, by resisting the
endeavour of the water downwards, penetrate the same and pass up
through it. By the first of these ways, while the hole at A remains
stopped, there is no possible passage; nor by the second, unless the
holes be so great that the water, flowing out at them, can by its own
weight force the air at the same time to ascend into the vessel by
the same holes: as we see it does in a vessel whose mouth is wide
enough, when we turn suddenly the bottom upwards to pour out the
water; for then the air being forced by the weight of the water,
enters, as is evident by the sobbing and resistance of the water, at
the sides or circumference of the orifice. And this I take for a sign
that all space is full; for without this, the natural motion of the water,
which is a heavy body, downwards, would not be hindered.
The 3. On the contrary, for the establishing of vacuum,
arguments of many and specious arguments and experiments
Lucretius for have been brought. Nevertheless there seems to be
vacuum
invalid. something wanting in all of them to conclude it
firmly. These arguments for vacuum are partly made
by the followers of the doctrine of Epicurus; who taught that the
world consists of very small spaces not filled by any body, and of
very small bodies that have within them no empty space, which by
reason of their hardness he calls atoms; and that these small bodies
and spaces are every where intermingled. Their arguments are thus
delivered by Lucretius.
And first he says, that unless it were so, there could be no motion.
For the office and property of bodies is to withstand and hinder
motion. If, therefore, the universe were filled with body, motion
would everywhere be hindered, so as to have no beginning
anywhere; and consequently there would be no motion at all. It is
true that in whatsoever is full and at rest in all its parts, it is not
possible motion should have beginning. But nothing is drawn from
hence for the proving of vacuum. For though it should be granted
that there is vacuum, yet if the bodies which are intermingled with
it, should all at once and together be at rest, they would never be
moved again. For it has been demonstrated above, in chap. IX, art.
7, that nothing can be moved but by that which is contiguous and
already moved. But supposing that all things are at rest together,
there can be nothing contiguous and moved, and therefore no
beginning of motion. Now the denying of the beginning of motion,
doth not take away present motion, unless beginning be taken away
from body also. For motion may be either co-eternal, or concreated
with body. Nor doth it seem more necessary that bodies were first at
rest, and afterwards moved, than that they were first moved, and
rested, if ever they rested at all, afterwards. Neither doth there
appear any cause, why the matter of the world should, for the
admission of motion, be intermingled with empty spaces rather than
full; I say full, but withal fluid. Nor, lastly, is there any reason why
those hard atoms may not also, by the motion of intermingled fluid
matter, be congregated and brought together into compounded
bodies of such bigness as we see. Wherefore nothing can by this
argument be concluded, but that motion was either coeternal, or of
the same duration with that which is moved; neither of which
conclusions consisteth with the doctrine of Epicurus, who allows
neither to the world nor to motion any beginning at all. The
necessity, therefore, of vacuum is not hitherto demonstrated. And
the cause, as far as I understand from them that have discoursed
with me of vacuum, is this, that whilst they contemplate the nature
of fluid, they conceive it to consist, as it were, of small grains of hard
matter, in such manner as meal is fluid, made so by grinding of the
corn; when nevertheless it is possible to conceive fluid to be of its
own nature as homogeneous as either an atom, or as vacuum itself.
The second of their arguments is taken from weight, and is
contained in these verses of Lucretius:
Corporis officium est quoniam premere omnia deorsum;

Contra autem natura manet sine pondere inanis;
Ergo, quod magnum est æque, leviusque videtur,
Nimirum plus esse sibi declarat inanis.--I. 363-66.
That is to say, seeing the office and property of body is to press all
things downwards; and on the contrary, seeing the nature of
vacuum is to have no weight at all; therefore when of two bodies of
equal magnitude, one is lighter than the other, it is manifest that the
lighter body hath in it more vacuum than the other.
To say nothing of the assumption concerning the endeavour of
bodies downwards, which is not rightly assumed, because the world
hath nothing to do with downwards, which is a mere fiction of ours;
nor of this, that if all things tended to the same lowest part of the
world, either there would be no coalescence at all of bodies, or they
would all be gathered together into the same place: this only is
sufficient to take away the force of the argument, that air,
intermingled with those his atoms, had served as well for his
purpose as his intermingled vacuum.
The third argument is drawn from this, that lightning, sound, heat
and cold, do penetrate all bodies, except atoms, how solid soever
they be. But this reason, except it be first demonstrated that the
same things cannot happen without vacuum by perpetual generation
of motion, is altogether invalid. But that all the same things may so
happen, shall in due place be demonstrated.
Lastly, the fourth argument is set down by the same Lucretius in
these verses:
Duo de concursu corpora lata

Si cita dissiliant, nempe aer omne necesse est,
Inter corpora quod fuerat, possidat inane.
Is porro quamvis circum celerantibus auris
Confluat, haud poterit tamen uno tempore totum
Compleri spatium; nam primum quemque necesse est
Occupet ille locum, deinde omnia possideantur.--I. 385-91.
That is, if two flat bodies be suddenly pulled asunder, of necessity

the air must come between them to fill up the space they left empty.
But with what celerity soever the air flow in, yet it cannot in one
instant of time fill the whole space, but first one part of it, then
successively all. Which nevertheless is more repugnant to the
opinion of Epicurus, than of those that deny vacuum. For though it
be true, that if two bodies were of infinite hardness, and were joined
together by their superficies which were most exactly plane, it would
be impossible to pull them asunder, in regard it could not be done
but by motion in an instant; yet, if as the greatest of all magnitudes
cannot be given, nor the swiftest of all motions, so neither the
hardest of all bodies; it might be, that by the application of very
great force, there might be place made for a successive flowing in of
the air, namely, by separating the parts of the joined bodies by
succession, beginning at the outermost and ending at the innermost
part. He ought, therefore, first to have proved, that there are some
bodies extremely hard, not relatively as compared with softer
bodies, but absolutely, that is to say, infinitely hard; which is not
true. But if we suppose, as Epicurus doth, that atoms are indivisible,
and yet have small superficies of their own; then if two bodies
should be joined together by many, or but one only small superficies
of either of them, then I say this argument of Lucretius would be a
firm demonstration, that no two bodies made up of atoms, as he
supposes, could ever possibly be pulled asunder by any force
whatsoever. But this is repugnant to daily experience.
Other 4. And thus much of the arguments of Lucretius.
arguments Let us now consider the arguments which are drawn
for the from the experiments of later writers.
establishing
I. The first experiment is this: that if a hollow
of vacuum,
invalid. vessel be thrust into water with the bottom
upwards, the water will ascend into it; which they
say it could not do, unless the air within were thrust together into a
narrower place; and that this were also impossible, except there
were little empty places in the air. Also, that when the air is
compressed to a certain degree, it can receive no further
compression, its small particles not suffering themselves to be pent
into less room. This reason, if the air could not pass through the
water as it ascends within the vessel, might seem valid. But it is
sufficiently known, that air will penetrate water by the application of
a force equal to the gravity of the water. If therefore the force, by
which the vessel is thrust down, be greater or equal to the
endeavour by which the water naturally tendeth downwards, the air
will go out that way where the resistance is made, namely, towards
the edges of the vessel. For, by how much the deeper is the water
which is to be penetrated, so much greater must be the depressing
force. But after the vessel is quite under water, the force by which it
is depressed, that is to say, the force by which the water riseth up, is
no longer increased. There is therefore such an equilibration
between them, as that the natural endeavour of the water
downwards is equal to the endeavour by which the same water is to
be penetrated to the increased depth.
II. The second experiment is, that if a concave cylinder of sufficient
length, made of glass, that the experiment may be the better seen,
having one end open and the other close shut, be filled with
quicksilver, and the open end being stopped with one's finger, be
together with the finger dipped into a dish or other vessel, in which
also there is quicksilver, and the cylinder be set upright, we shall, the
finger being taken away to make way for the descent of the
quicksilver, see it descend into the vessel under it, till there be only
so much remaining within the cylinder as may fill about twenty-six
inches of the same; and thus it will always happen whatsoever be
the cylinder, provided that the length be not less than twenty-six
inches. From whence they conclude that the cavity of the cylinder
above the quicksilver remains empty of all body. But in this
experiment I find no necessity at all of vacuum. For when the
quicksilver which is in the cylinder descends, the vessel under it
must needs be filled to a greater height, and consequently so much
of the contiguous air must be thrust away as may make place for the
quicksilver which is descended. Now if it be asked whither that air
goes, what can be answered but this, that it thrusteth away the next
air, and that the next, and so successively, till there be a return to
the place where the propulsion first began. And there, the last air
thus thrust on will press the quicksilver in the vessel with the same
force with which the first air was thrust away; and if the force with
which the quicksilver descends be great enough, which is greater or
less as it descends from a place of greater or less height, it will make
the air penetrate the quicksilver in the vessel, and go up into the
cylinder to fill the place which they thought was left empty. But
because the quicksilver hath not in every degree of height force
enough to cause such penetration, therefore in descending it must
of necessity stay somewhere, namely, there, where its endeavour
downwards, and the resistance of the same to the penetration of the
air, come to an equilibrium. And by this experiment it is manifest,
that this equilibrium will be at the height of twenty-six inches, or
thereabouts.
III. The third experiment is, that when a vessel hath as much air in
it as it can naturally contain, there may nevertheless be forced into it
as much water as will fill three quarters of the same vessel. And the
experiment is made in this manner. Into the glass bottle,
represented (in figure 2) by the sphere F G, whose centre is A, let
the pipe B A C be so fitted, that it may precisely fill the mouth of the
bottle; and let the end B be so near the bottom, that there may be
only space enough left for the free passage of the water which is
thrust in above. Let the upper end of this pipe have a cover at D,
with a spout at E, by which the water, when it ascends in the pipe,
may run out. Also let H C be a cock, for the opening or shutting of
the passage of the water between B and D, as there shall be
occasion. Let the cover D E be taken off, and the cock H C being
opened, let a syringe full of water be forced in; and before the
syringe be taken away, let the cock be turned to hinder the going
out of the air. And in this manner let the injection of water be
repeated as often as it shall be requisite, till the water rise within the
bottle; for example, to G F. Lastly, the cover being fastened on
again, and the cock H C opened, the water will run swiftly out at E,
and sink by little and little from G F to the bottom of the pipe B.
From this phenomenon, they argue for the necessity of vacuum in
this manner. The bottle, from the beginning, was full of air; which air
could neither go out by penetrating so great a length of water as
was injected by the pipe, nor by any other way. Of necessity,
therefore, all the water as high as F G, as also all the air that was in
the bottle before the water was forced in, must now be in the same
place, which at first was filled by the air alone; which were
impossible, if all the space within the bottle were formerly filled with
air precisely, that is, without any vacuum. Besides, though some
man perhaps may think the air, being a thin body, may pass through
the body of the water contained in the pipe, yet from that other
phenomenon, namely, that all the water which is in the space B F G
is cast out again by the spout at E, for which it seems impossible
that any other reason can be given besides the force by which the
air frees itself from compression, it follows, that either there was in
the bottle some space empty, or that many bodies may be together
in the same place. But this last is absurd; and therefore the former is
true, namely, that there was vacuum.
This argument is infirm in two places. For first, that is assumed
which is not to be granted; and in the second place, an experiment
is brought, which I think is repugnant to vacuum. That which is
assumed is, that the air can have no passage out through the pipe.
Nevertheless, we see daily that air easily ascends from the bottom to
the superficies of a river, as is manifest by the bubbles that rise; nor
doth it need any other cause to give it this motion, than the natural
endeavour downwards of the water. Why, therefore, may not the
endeavour upwards of the same water, acquired by the injection,
which endeavour upwards is greater than the natural endeavour of
the water downwards, cause the air in the bottle to penetrate in like
manner the water that presseth it downwards; especially, seeing the
water, as it riseth in the bottle, doth so press the air that is above it,
as that it generateth in every part thereof an endeavour towards the
external superficies of the pipe, and consequently maketh all the
parts of the enclosed air to tend directly towards the passage at B? I
say, this is no less manifest, than that the air which riseth up from
the bottom of a river should penetrate the water, how deep soever it
be. Wherefore I do not yet see any cause why the force, by which
the water is injected, should not at the same time eject the air.
And as for their arguing the necessity of vacuum from the
rejection of the water; in the first place, supposing there is vacuum,
I demand by what principle of motion that ejection is made.
Certainly, seeing this motion is from within outwards, it must needs
be caused by some agent within the bottle; that is to say, by the air
itself. Now the motion of that air, being caused by the rising of the
water, begins at the bottom, and tends upwards; whereas the
motion by which it ejecteth the water ought to begin above, and
tend downwards. From whence therefore hath the enclosed air this
endeavour towards the bottom? To this question I know not what
answer can be given, unless it be said, that the air descends of its
own accord to expel the water. Which, because it is absurd, and that
the air, after the water is forced in, hath as much room as its
magnitude requires, there will remain no cause at all why the water
should be forced out. Wherefore the assertion of vacuum is
repugnant to the very experiment which is here brought to establish
it.

Immediate Download Reinforcement Learning For Optimal Feedback Control Rushikesh Kamalapurkar Ebooks 2024

Uploaded by

Copyright:

Available Formats

Immediate Download Reinforcement Learning For Optimal Feedback Control Rushikesh Kamalapurkar Ebooks 2024

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Immediate Download Reinforcement Learning For Optimal Feedback Control Rushikesh Kamalapurkar Ebooks 2024

Uploaded by

Copyright:

Available Formats

Full download text book at textbookfull.

Reinforcement Learning for Optimal

Download more textbook from textbookfull.com

Reinforcement learning and Optimal Control Draft

Intelligent Optimal Adaptive Control for Mechatronic

Networked control systems with intermittent feedback

Housing Fit For Purpose: Performance, Feedback and

Deep Reinforcement Learning in Action 1st Edition

Optimal Control in Thermal Engineering 1st Edition

Analog Automation and Digital Feedback Control

Reinforcement Learning An Introduction Adaptive

More information about this series at http://www.springer.com/series/61

Joel Rosenfeld Warren Dixon

Patrick Walters Warren Dixon

ISSN 0178-5354 ISSN 2197-7119 (electronic)

© Springer International Publishing AG 2018

Printed on acid-free paper

To my strong and caring grandparents.

To my wife, Laura Forest Gruss Rosenfeld,

To my beautiful son, Isaac Nathaniel Dixon.

engineering, mechanical engineering, mathematics, and process engineering

uncertain nonlinear systems and to generate approximate feedback-Nash equilib-

The authors would like to express their sincere appreciation to a number of

Stillwater, OK, USA Rushikesh Kamalapurkar

2.5 Differential Online Approximate Optimal Control . . . . ......... 28

4.3.4 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

R Set of real numbers

kmin Minimum eigenvalue

© Springer International Publishing AG 2018 1

h : A → C is interpreted as g (x, y) f (y) + h (x) , ∀ (x, y) ∈ A × B. The nota-

1.3 The Bolza Problem

Consider a controlled dynamical system described by the initial value problem

ẋ (t) = f (x (t) , u (t) , t) , x (t0 ) = x0 , (1.1)

where L : Rn × U × R≥0 → R is the Lagrange cost, Φ : R → R is the Mayer

1.4 Dynamic Programming

Dynamic programming methods generalize the Bolza problem. Instead of solving

V ∗ (x, t) inf J (t, x, u (·)) , (1.4)

1.4.1 Necessary Conditions for Optimality

In the subsequent development, a set of necessary conditions for the optimality of

Theorem 1.1 [7, p. 160] The value function, V ∗ , satisfies

Based on the definition in (1.4)

Using (1.3) and combining the integrals,

V ∗ (x, t) +  ≥ J (t, x, u  (·)) .

Let x denote the trajectory corresponding to u  . Then,

J (t, x, u  ) = L (x (τ ) , u  (τ ) , τ ) dτ + J (t + Δt, x (t + Δt) , u  ) ,

≥ L (x (τ ) , u  (τ ) , τ ) dτ + V (x (t + Δt) , t + Δt) ≥ V (x, t) .

The 1. I have, in the first chapter, defined philosophy

OF THE WORLD AND OF THE STARS.

The 1. Consequent to the contemplation of sense is

Corporis officium est quoniam premere omnia deorsum;

Duo de concursu corpora lata

That is, if two flat bodies be suddenly pulled asunder, of necessity

You might also like

Theorem 1.1 [7, p. 160] The value function, V ∗ , satisfies

V ∗ (x, t) + ≥ J (t, x, u (·)) .

Let x denote the trajectory corresponding to u . Then,

J (t, x, u ) = L (x (τ ) , u (τ ) , τ ) dτ + J (t + Δt, x (t + Δt) , u ) ,

≥ L (x (τ ) , u (τ ) , τ ) dτ + V (x (t + Δt) , t + Δt) ≥ V (x, t) .