Option Pricing Using Artificial Neural Networks
Option Pricing Using Artificial Neural Networks
Option Pricing Using Artificial Neural Networks
DOCTORAL THESIS
Hahn, Tobias
Award date:
2014
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
November 2013
The thesis addresses the question of how option pricing can be improved using
machine learning techniques. The focus is on the modelling of volatility, the cen-
tral determinant of an option price, using artificial neural networks. This is done
explicitly as a volatility forecast and its accuracy evaluated. In addition, its use in
option pricing is tested and compared with a direction option pricing approach.
A review of existing literature demonstrated a lack of clarity with respect to
the model development methodology used in the area. This issue is discussed and
finally addressed along with a consolidation of the various modelling approaches
undertaken previously by researchers in the field. To this end, a consistent process
is developed to guide the specific model development.
Previous research has focused on index options, i.e. a single time series and
some options related to it. The aim of the research presented here was to extend
this to equity options, taking into consideration the particular characteristics of
the underlying and the options.
The research focuses on the Australian equity option market before and after
the global financial crisis. The results suggest that in the market and over the time
frame studied, an explicit volatility model combined with existing deterministic
models is preferable.
Beyond the specific results of the study, a detailed discussion of the limitations
and methodological issues is presented. These relate not only to the methodology
used here but the various choices and tradeoffs faced whenever machine learning
techniques are used for volatility or option price modelling. Academic insight
as well as practical applications depend critically on the understanding of these
choices.
Declaration
Signature Date
Additional Research Outcomes
The following publications and presentations were prepared up to and during the
candidature, albeit not directly related to the research presented in this thesis.
Publications
I would also like to thank my family for their ongoing support and patience.
The research presented in this thesis used data supplied by the Securities Industry
Research Centre of Asia-Pacific (SIRCA) including data from Thomson-Reuters
and the Australian Securities Exchange (ASX).
Contents
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Goals and Hypotheses . . . . . . . . . . . . . . . . . . . 3
1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Literature Review 9
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 The Australian Equity and Option Markets . . . . . . . . 9
2.1.2 Principles of Financial Modelling . . . . . . . . . . . . . . 11
2.2 Options Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Forwards and Futures . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Options Characteristics . . . . . . . . . . . . . . . . . . . 16
2.2.3 Pricing of European-style Options . . . . . . . . . . . . . 17
2.2.4 Pricing of American-style Options . . . . . . . . . . . . . . 19
2.2.5 Option Greeks and Additional Considerations . . . . . . . 22
2.3 Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Overview of Modelling Approaches . . . . . . . . . . . . . 24
2.3.2 Historical Volatility . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 Stochastic Volatility and the ARCH-Family of Models . . 27
2.3.4 Volatility Adjustments . . . . . . . . . . . . . . . . . . . . 28
2.3.5 Volatility Term Structure and Surface Models . . . . . . . 30
2.4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 Financial Applications of Machine Learning . . . . . . . . . . . . 37
2.5.1 Option Pricing . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.2 Volatility Modelling . . . . . . . . . . . . . . . . . . . . . 61
2.6 Open Research Problems . . . . . . . . . . . . . . . . . . . . . . . 69
ii Contents
3 Methodology 73
3.1 Model Development and Experimental Design . . . . . . . . . . . 73
3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.2 Volatility Forecast (Hypothesis 1) . . . . . . . . . . . . . . 75
3.1.3 Option Pricing (Hypothesis 2) . . . . . . . . . . . . . . . . 79
3.2 Data Scope and Sources . . . . . . . . . . . . . . . . . . . . . . . 86
3.3 Model Fitting and Testing . . . . . . . . . . . . . . . . . . . . . . 88
3.3.1 Simulation Implementation . . . . . . . . . . . . . . . . . 88
3.3.2 ANN Training . . . . . . . . . . . . . . . . . . . . . . . . 91
3.3.3 ANN Model Selection . . . . . . . . . . . . . . . . . . . . 92
3.4 Model Evaluation and Comparisons . . . . . . . . . . . . . . . . . 94
3.5 Theoretical and Practical Limitations . . . . . . . . . . . . . . . . 97
4 Analysis of Data 99
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2 General Characteristics of the Data Set . . . . . . . . . . . . . . . 99
4.3 Volatility Forecast Evaluation . . . . . . . . . . . . . . . . . . . . 101
4.4 Volatility Surface Fitting . . . . . . . . . . . . . . . . . . . . . . . 118
4.5 Option Pricing Evaluation . . . . . . . . . . . . . . . . . . . . . . 134
4.6 Summary and Implications . . . . . . . . . . . . . . . . . . . . . . 149
5 Conclusion 151
5.1 Summary of Results and Implications for the Hypotheses . . . . . 151
5.2 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . 152
5.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Bibliography 159
4.21 𝜎𝑀,𝑇
ANNs
Training Record . . . . . . . . . . . . . . . . . . . . . . . . 124
4.22 Error Distribution of Trained Network Architectures for 𝜎𝑀,𝑇 ANNs
(In-sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.23 Target and Output Values for the 𝜎𝑀,𝑇 ANNs
In-sample Data (Without
Outliers) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.24 Target and Output Values for the 𝜎𝑀,𝑇 ANNs
Out-of-sample Data
(Without Outliers) . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.25 Comparison of Volatility Surface Errors (In-sample) . . . . . . . . 128
4.26 Comparison of Volatility Surface Errors (Out-of-sample) . . . . . 128
4.27 Comparison of 𝜎𝑀,𝑇 HVL
Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.28 Comparison of 𝜎𝑀,𝑇 HVS
Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.29 Comparison of 𝜎𝑀,𝑇GARCH
Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.30 Comparison of 𝜎𝑀,𝑇 ANNd
Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.31 Comparison of 𝜎𝑀,𝑇 ANNs
Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.32 Characteristics of Variable Value Ranges (In-sample) for 𝐶 ANN . 135
4.33 Characteristics of Variable Value Ranges (Out-of-sample) for 𝐶 ANN 135
4.34 𝐶 ANN Training Record . . . . . . . . . . . . . . . . . . . . . . . . 136
4.35 Error Distribution of Trained Network Architectures for 𝐶 ANN
(In-sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.36 Target and Output Values for the 𝐶 ANN In-sample Data . . . . . 138
4.37 Target and Output Values for the 𝐶 ANN In-sample Data (Without
Outliers) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.38 Target and Output Values for the 𝐶 ANN Out-of-sample Data . . . 139
List of Figures vii
4.39 Target and Output Values for the 𝐶 ANN Out-of-sample Data
(Without Outliers) . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.40 Comparison of Option Pricing Errors (In-sample) . . . . . . . . . 141
4.41 Comparison of Option Pricing Errors (Out-of-sample) . . . . . . . 142
4.42 Comparison of 𝐶 HVL Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.43 Comparison of 𝐶 HVS Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.44 Comparison of 𝐶 GARCH Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.45 Comparison of 𝐶 ANNd Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.46 Comparison of 𝐶 ANNs Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.47 Comparison of 𝐶 ANN Errors Applied to In-sample and Out-of-
sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
List of Acronyms
AHE average hedging error
AIC Akaike information criterion
ANN artificial neural network
ANOVA analysis of variance
ASX Australian Securities Exchange (previously Australian Stock
Exchange)
ARCH autoregressive conditional heteroscedasticity
ATM at-the-money option
BIC (Schwarz) Bayesian information criterion
BS Black-Scholes (option pricing model or formula)
BSM Black-Scholes-Merton (option pricing model)
CART classification and regression trees
CEV constant elasticity of variance model
CME Chicago Mercentile Exchange
CRR Cox-Rubinstein-Ross (option pricing model)
CS Corrado-Su (option pricing model)
DAX Deutscher Aktienindex
DM Diebold-Mariano test statistic
DVF deterministic volatility function
EGARCH exponential GARCH (see below for GARCH)
EMU European Monetary Union
ETO exchange-traded option
GARCH generalised autoregressive conditional heteroscedasticity
GFC Global Financial Crisis of 2007–2011
GRNN generalised regression (artificial) neural network
HHL Haug-Haug-Lewis option pricing model
HV historic volatility
ITM in-the-money option
IV implied volatility
Libor British Bankers’ Association London Interbank Offered Rate
LIFFE London International Financial Futures and Options Exchange
x List of Figures
1
t as a subscript may be missing if the context is sufficiently clear, i.e. typically at 𝑡 = 0.
Chapter 1
Introduction
The success of any financial service provider, business or major individual client
critically depends on the systems to perform these functions including develop-
ment of such computer systems and researchers developing the models on which
they are based.
Sophistication, deregulation and automation had another effect on the field
of finance: a considerable increase in competition. Banks, who used to be the
major providers of financial services together with insurance companies, no longer
hold such a significant position in the market. Funds, whether they are mutual
funds representing individual savers, businesses, pension funds,2 as well as hedge
funds, large individual investors control significant amounts of capital and exert
considerable influence.
All market participants compete for access to and control of a limited number
of investment opportunities. In their search for return, or more formally the right
risk-return trade-off, they rely on a large number of models to forecast and value
individual cash-flows and measure their associated risks. The competitive nature
and the significant transparency of financial markets when compared to markets
for real assets requires continuous improvement in model accuracy and refinement
of both models and processes.
While there are many benefits of such progress, there are, however, several prob-
lems. It is commonly argued that the Global Financial Crisis (GFC) of 2007–2011
has its roots in the lack of regulatory supervision as a result of deregulation, inap-
propriate strategies to address principal-agency problems, especially with respect
to the remuneration of finance professionals in several areas of the finance indus-
try. Lastly, an over-reliance on the strictly mathematical models used to value
securities without consideration of their limitations both theoretical as well as
practical ones is understood to have been a factor as well. This is particularly
true with respect to assumptions made on the return process of real estate in the
United States of America (USA) and the risk associated with the investment in
sovereign debt issued by members of the European Monetary Union (EMU).
Regardless of the merits of deregulation and its effects, the market structure
that has emerged resulted, amongst other trends, in the development of a market
for derivatives over the past twenty years that is active, broad with respect to
geography, industries and firms, and accessible to even individual investors. The
2
In Australia, these would typically be in the form of Superannuation Funds. Self-managed
superannuation funds would be treated like individual investors for the purpose of this discus-
sion.
1.2 Research Goals and Hypotheses 3
of the over-the-counter market limits the research to only a fraction of the total
market in such instruments, it nevertheless restricts it to the more conservative
subset. The organised market is particularly transparent and the (near) absence
of counter-party risk limits valuation to the actual cash-flows, their incidence
(timing), magnitude (amount) and risk.
The valuation is only one part of the pricing process. In addition to determining
the fair value of the instruments, it is also important to consider the market’s
assessment of the value. In that sense the pricing and valuation may, and typically
do, differ.
Another aspect of the research results from a particular feature of option val-
uation. While the value, and consequently the price, of an option depends on a
number of variables, one, the future volatility of the underlying security, is of par-
ticular importance. The valuation problem is therefore also a forecasting problem.
The two cannot be separated.
The final component determining the research direction are recent developments
towards less prescriptive models for valuation purposes. This results from a variety
of developments such as the trend towards increasing automation, higher trans-
action volumes, greater competition and more complex securities. This largely
led to a greater focus on data-driven approaches as well as models that can be
developed within fairly short time-frames.3
It is also likely that the experiences gained during the GFC will lead to even
more focus on data-driven modelling. There is greater awareness today than up
until 2007 that theoretical return distributions are insufficient to describe security
returns. An exacerbating factor is the wider availability of data and the faster
and thus easier processing of large data sets in an automated fashion.
As will be discussed in subsequent chapters, the focus of research in this area
has been on artificial neural networks. While alternative machine learning tech-
niques are available to researchers and practitioners, artificial neural networks are
particularly well-suited to the type of data found in financial markets, such as
large data sets of noisy data with non-linear relationships between the variables.
The question resulting from these developments is therefore if machine learning
techniques, notably artificial neural networks (ANNs) can be used to enhance the
models used for pricing Australian equity options or even replace them entirely.
3
Short development periods are largely the result of models becoming irrelevant in the face
of competition in even decreasing periods themselves. It is therefore necessary to develop models
replacing existing ones to respond to the market pressures. While this is largely true for trading
systems; these systems typically rely at least in part on valuation models themselves.
1.2 Research Goals and Hypotheses 5
The following hypotheses should be seen in the context of this market definition,
i.e. they relate to the Australian equity options market, a point that will not be
reiterated for the purpose of clarity and brevity.
The problem can be approached in two different ways. The option pricing could
be improved by using a better value for the volatility input or the pricing can
be substituted with an ANN. The former is expressed in the following working
hypothesis:4
The hypothesis is based both on prior research as well as the understanding that
volatility is a more complex process than many other time series thus lending
itself to non-parametric estimation and forecasting techniques.
An important aspect of the hypothesis is the focus on option pricing. As will
become clear when discussing the existing volatility forecasting models, there is
no such thing as the best model. Instead, there is at least one suitable model for any
particular intended use of volatility. Since the focus of the research is on option
pricing, the intended use of the volatility is clear. This also has implications for
the performance measure used.
The alternative approach to using machine learning techniques for option pric-
ing is the direct one:
4
Corresponding test hypotheses 𝐻0 will be introduced in Chapter 3.
6 Chapter 1 Introduction
cannot be worse, at least in-sample, than the volatility forecast followed by the
pricing function.
This argument does not necessarily hold out-of-sample as the pricing function
may impose constraints on the ultimate price in a way the ANN cannot.
Equally, Hypothesis 1 does not represent a prerequisite to Hypothesis 2. If
Hypothesis 1 cannot be supported, i.e. its corresponding 𝐻0 cannot be disproved,
it would not necessarily preclude Hypothesis 2 from being true. The pricing model
may be misspecified itself and an otherwise good forecast may not lead to a better
price if it fails to compensate for, or even exacerbates, the problem.
Finally, a question arises whether the separation of the two problems can be
achieved and if this can combine the various benefits, i.e. to have an explicit
volatility forecast as well as out-performance due to the use of an ANN for option
pricing. This is in contrast to an integrated ANN that does not attempt to model
volatility directly. The issue of separability is further discussed in Chapter 3.
The first approach may be preferable to some users of machine learning tech-
niques, particularly in a transition phase. Given their black-box nature, ANNs
as well as many other techniques applied in various fields of study have long
suffered from a lack of acceptance. Being able to observe at least some of the
internal mechanisms, in this case intermediate results, without a loss of perfor-
mance would be helpful. The fitting of a model so complex is likely to prove
difficult.
On the other hand, the argument provided earlier does hold here as well: the
pricing ANN should not systematically do worse than the single fitting model as it
has access to all components and an additional volatility-forecasting ANN, which
can be ignored, in principle. This depends on the nature of the volatility-related
input, however, a point elaborated on further below. If a conflict occurs, the
best-performing model is likely to be used except for educational uses, given the
market pressures.
To answer these questions the thesis is organised as follows: Chapter 2 reviews
in detail the literature regarding volatility forecasting, option pricing and ma-
chine learning. Some background is given first about the structure of the market
and terminology to provide a context for the discussion that follows. A sum-
mary is provided detailing not only the major research gaps but also discussing
some inherent epistemological limitations of research in this area. The discussion
covers the two main areas, the financial background and the machine learning
1.3 Research Contributions 7
• Not only was past research focused almost exclusively on simple time series,
it was also predominantly driven by a singular objective. Research exists on
8 Chapter 1 Introduction
volatility forecasting and other research on option pricing but the two are
typically dealt with separately. The problem of pricing options, however,
requires a comprehensive analysis of at least those two components, which
is developed further below.
Secondly, contributions are made through empirical results, in particular the fol-
lowing:
• Equally limited is the existing body of knowledge with respect to the trans-
fer of knowledge from pricing options using machine learning techniques
from the USA to Australia.
Chapter 5 discusses the contributions in greater detail and with respect to the
hypotheses and how they are derived from the empirical results.
Chapter 2
Literature Review
2.1 Overview
2.1.1 The Australian Equity and Option Markets
This chapter provides an overview of previous research in the area with particular
focus on machine learning models for derivatives pricing. However, a brief intro-
duction to the pricing of futures and options is given as well as brief discussion of
existing volatility models – all without the use of any machine learning methods.
The principal motivation is to clarify terminology and discuss the competing,
i.e. benchmark models and methods as they are frequently used. In addition, it
provides a context for the development of the methodology as it is introduced in
this thesis. A summary of the market, its securities, and particular features will
be provided.
Those familiar with option pricing may wish to skip sections 2.1 (this section)
through 2.3, and those familiar with machine learning section 2.4, respectively.
As noted before the objective of the study is an investigation into the use of
machine learning techniques for Australian equity options. The Australian equity
market shares many of its characteristics with the equity markets of other large
developed countries, though it is certainly smaller and somewhat more concen-
trated in certain industries such as mining and agriculture traditionally. There
is also considerable concentration within the financial services sector5 and some
other regulated industries. The structure and notably the regulatory framework
are not substantially different from those of many other countries.
5
This fact will be of some importance in later chapters when discussing interest rates, the
impact of the GFC and the data set being used.
10 Chapter 2 Literature Review
equity market and dividing it in a number of strata. Their S&P/ASX index range
is based on criteria such as market capitalisation, liquidity, domicile, as well as
additional consideration regarding the stability of the constituent set. The most
concentrated set is the S&P/ASX 20, while the S&P/ASX 200 and S&P/ASX 300
are the primary investment and broader market index, respectively. The number
in the index name represents the approximate number of companies and thus
securities included at any one time.
As mentioned, the ASX is not only a venue for trading equity securities but
also for trading their derivatives. These include options with fixed terms, flexible
options with negotiated terms, and futures contracts. All such derivatives are
subject to central clearing and as is common for exchange-traded derivatives, the
counter-party risk is reduced through novation, i.e. by structuring the contracts
so that the clearing company is the counter-party to the market participants. Due
to their size, contractual arrangements and regulation, such arrangements limit
the likelihood and impact of defaults of any one party.
Among such arrangements is a requirement to post and maintain margins.
While this feature is common in futures-markets, it is not as prevalent in the case
of trading options as the exercise of an option and thus the obligation to deliver
the underlying is uncertain, as will be discussed below. Parties to such derivatives
contracts are required to provide sufficient funds (collateral) to satisfy the clearing
company that they will be able to deliver as and when delivery of the security or
cash is due.
longer. The consequence is that trading activity brings about arbitrage and the
no-arbitrage condition can only be met if there are investors looking for and
acting upon actual arbitrage opportunities.
This leaves open the question of how arbitrage opportunities are found and
what they are based on, i.e. how individuals identify situations where arbitrage
opportunities exist.
This has natural implications for the characterisation of the whole market.
Following Roberts (1967) (as cited in Campbell, Lo, and MacKinlay, 1997), a
distinction is made and a hypothesis formulated about actual markets rather than
model assumptions, with respect to the extent to which they permit arbitrage
and the information set that can, or more specifically cannot, be exploited to
generate economic profits. A market is considered to be efficient if no arbitrage
opportunities exist.
A distinction is typically made following Fama (1970) and Fama (1991): A
market is considered weak-form efficient if no economic profits can be made based
on past price series; this includes all information related to prices, such as the
returns series as well. In a semistrong-form efficient market, no economic profits
are achievable by acting on publicly available information. Since price series are
one such source, semistrong-form efficient markets also cover weak-form efficient
ones. This also includes, however, any kind of information about the company’s
accounts, balance sheet, income statement, statement of cash flows, management
or any other communication made by or about the firm in a public forum. The
strongest claim would be that markets are strong-form efficient. Here, even private
information cannot lead to economic profits as it too has been priced in typically
through arbitrage by insiders.
While no particular view is taken in this thesis regarding the actual level of
efficiency of markets, it is worth noting that the use of econometric models as
well as machine learning techniques is typically based on the implicit, or more
rarely explicit, assumption that markets are not efficient and that information
discovery and resulting arbitrage trade is possible. In a purely efficient market,
no such study would be required nor would it be worthwhile as no additional profit
can be generated to compensate for the additional cost of conducting the research.
A position is unnecessary in part since demonstrating superior performance of
an alternative model could in principle be explained by the fact that standard
assumptions of the default model are not met and that the market may still
price assets rationally and efficiently under modified (and realistic) assumptions.
2.2 Options Pricing 13
price 𝐾 agreed upon today, i.e. at 𝑡 = 0. Since the price at that future point in
time may, and typically will, be different, the parties benefit or lose out on the
transaction depending on whether they are the buyer or the seller, and whether
the future price is above or below the delivery price. The parties agree to settle
the contract at the future time by paying 𝐾 to the seller, and delivering the
asset(s) to the buyer. Typically a contract covers a larger number of assets of the
same type, the quantity is referred to as the contract size.
While a forward contract (in short: a forward) is any contract of this nature,
a futures contract (in short: a future) is the standardised version. The specific
details of their standardisation are country and asset-specific, although they fol-
low generally the same pattern. The contract size is chosen to be small enough
to be useful to a large number of potential market participants, while being large
enough to make trading economical. The choice of delivery dates is made with a
similar trade-off in mind, however, in the case of physical assets, especially agri-
cultural ones, additional constraints may exist due to the harvesting or breeding
cycle. Similarly, commodities forward and futures contracts need to specify the
location of delivery if it is a physical delivery, a significant question considering
transportation cost related to the securities.
The valuation of a forward is fairly simple given that the contract simply defers
delivery and payment for the future (see Hull, 2008, for a detailed discussion on
which the following notation is based). The price of a future 𝐹0 today (𝑡 = 0),
based on arbitrage arguments is:
𝐹0 = 𝑆0 𝑒𝑟𝑇 (2.1)
If the equation were not to hold, a trader could buy (sell) the forward contract
and sell (buy) the asset, whose price is 𝑆0 and make a profit from the difference. 𝑟
is the risk-free rate, the rate at which proceeds from a sale can be reinvested and
at which one can borrow funds. This assumes that no further income is associated
with holding the asset. This is not true in every case; additional income in form
of interest or dividend payments is a possibility among other sources. Depending
on whether such income is a discrete payment 𝐼 or a rate 𝑞, the forward price is
one of:
The value of a long forward contract 𝑓 based on the no-arbitrage assumption is:
since the value of two forwards with delivery price of 𝐹0 and 𝐾 has to be the
discounted price difference at maturity (𝐹0 − 𝐾). From 2.1–2.4 follows that the
value of a long forward contract is:
volatility models. These are almost exclusively traded OTC resulting generally in
a lack of data availability, price transparency and potentially significant counter-
party risk.
A final note on terminology: an option that has value to its holder is called an
in-the-money (ITM) option, in the case of a call (put) this occurs when the current
price is above (below) the strike price. An out-of-the-money (OTM) option is the
reverse, it represents an option that has currently no value if executed and an
at-the-money (ATM) option being the marginal case where strike and current
price are equal.
risk-free rate is effectively set to 𝑟 = 0 (Haug, 2007; Lieu, 1990). The pricing of
options with futures-style margining was introduced by Asay (1982). Lieu (1990)
shows that the interest rate effectively disappears from the pricing Black-Scholes
formula and argues that early exercise is no longer relevant resulting in both ex-
ercise types to be priced in the same way (see Lajbcygier et al., 1996, for an
example of using the European-style pricing for American-style exercise). Kuo
(1993) extends this model to accommodate actual cash flows resulting from the
marking-to-market inherent in margining and their financing by market partici-
pants. Finally, Chen and Scott (1993) show that the findings by Lieu (1990) apply
more generally and confirm that early exercise is not beneficial.
As White (2000b) points out, however not only do the authors ignore marking-
to-market (a criticism similar to Kuo’s) but also that the models only apply
to non-coupon bearing securities. This focus on transaction cost is particularly
noteworthy given the suggestion by Dumas, Fleming, and Whaley (1996) that
the problems with the Black-Scholes model as it is applied in practice may not
be due to a misspecification but a mismatch between supply of and demand for
certain options after the market ‘crash’ of 1987.
Various authors contributed additional adjustments to the BSM model, in par-
ticular, to account for a number of deviations from the assumptions made therein.
A summary discussion of the most significant adjustments with relevant references
is provided by Haug (2007, chapter 6).
4. simulation-based approaches.
The approach pursued in this thesis falls in the last category but will be treated
separately given the significantly different assumptions and process.
As will be discussed in the following section, early exercise is not optimal for
calls on assets that do not pay income. In the context of this thesis, it would not
be optimal to exercise a call if the company does not, or rather is not expected to,
pay dividends. In such cases, the American call can be treated like a European
one and priced accordingly. In any case, the same parameters are used in the
pricing models: 𝑆0 , 𝐾, 𝜎, 𝑇 , 𝑟, and either 𝑞 or 𝐷𝑡 depending on the nature of the
dividend model. Continuing the use of the notation by Hull (2008), the American
call is referred to as 𝐶 and the corresponding put as 𝑃 .
All alternatives to this approach are ultimately approximations due to either the
nature of their assumptions or due to the way they are implemented. Closed-form
approximations exploit the fact that the early exercise is optimal only just before
the dividend payout, in a practical context the ex-dividend date rather than the
payment date. They further simplify dividend payments as a yield similar to the
yield used in the previous sections on futures and European options.
The model by Barone-Adesi and Whaley (1987) is of practical importance ac-
cording to Haug (2007) and is based on a quadratic approximation after determin-
ing critical values for the underlying prices. An alternative model by Bjerksund
and Stensland (2002) is considered somewhat better by Haug (2007) and uses
two exercise boundaries over time. The approximation models have the substan-
tial benefit of being very fast when calculating prices and when computing the
corresponding derivatives (see the following section). Their major disadvantage
is the use of a dividend yield instead of modelling discrete dividends as they are
expected to occur.
Tree-based models allow for the explicit modelling of discrete dividends and
consequently allow for an accurate pricing of a large variety of options. The
structure and evolution of the trees varies by model but is based on the current
price of the underlying and moving through time. At each node the price is deter-
mined and exercise decisions (if applicable) are made. The values are aggregated
back towards the root of the tree to determine the price of the option. The most
common model is the binomial method by Cox, Ross, and Rubinstein (1979) and
Rendleman and Bartter (1979) and is typically referred to as the Cox-Rubinstein-
Ross (CRR) model. Despite it being relatively old, it remains among the most
widely used (Haug, 2007). If the steps are chosen sufficiently small and there-
2.2 Options Pricing 21
fore sufficiently many, the model converges to the BSM model for the European
option.
The concept of a binomial tree, i.e. a tree whose nodes have exactly two child
nodes, is also closely related to risk-neutral valuation. It is notable, and entirely a
function of the no-arbitrage condition, that the expected return of the underlying
asset does not feature in any of the models mentioned so far and does not in any
other model. This simplifies modelling and interpretation. Alternatives to the
binomial method exist such as the trinomial tree (Boyle, 1986), which introduces
a no-change state in addition to the up and down movements in the binomial
tree, or implied tree models.
The latter are of great practical importance albeit not for the simple options.
They are worth mentioning, however, due to their particular relationship with
volatility models. As discussed previously, volatility is one of the parameters of the
pricing model and typically the critical one due to the relatively high sensitivity
of the option price to changes in volatility but also because it is the one most
difficult to estimate (𝑆0 , 𝐾 and 𝑇 being known and 𝑟 being either irrelevant or
fairly constant over short periods of time). Implied tree models (Dupire, 1994;
Derman and Kani, 1994; Rubinstein, 1994) use the implied volatility observed
in ETOs to price exotic OTC options. The structure and pricing mechanisms
are similar to the standard binomial and trinomial trees. The benefit is that
the volatility and therefore future expectations from a large number of market
participants are aggregated and used in markets where no such information exists
or is difficult to get. In addition, pricing exotic options and ‘vanilla’ options (i.e.
simple exercise patterns such as the European-style exercise) using the same
implied volatilities is in itself a case of arbitrage pricing. The price attached to a
particular volatility expectation is the same if the implied tree model is used as
in the “vanilla” options thus extending the modelling approach to within-asset
class pricing.
While tree-based models are more accurate with respect to dividend payments
if executed correctly, they are computationally expensive and the accuracy can
be compromised due to numerical issues, i.e. rounding errors and stability, as well
as trade-offs between convergence and execution speed.
Lastly, simulation-based models involve the generation of suitable samples from
known (joint) distributions of underlying asset prices. Monte-Carlo methods are
frequently used when needed but they suffer from even more significant compu-
22 Chapter 2 Literature Review
tational problems than the tree-based models. The use of many machine learning
techniques, notably ANNs, also falls in this category.
𝜕𝑐 𝜕𝑝
Δ𝑐 = Δ𝑝 =
𝜕𝑆 𝜕𝑆
𝜕2𝑐 𝜕2𝑝
Γ= =
𝜕𝑆 2 𝜕𝑆 2
𝜕𝑐 𝜕𝑝
V= =
𝜕𝜎 𝜕𝜎
Equally, put options cannot be more expensive than the strike price. The argu-
ment is the same, only a loss could be achieved by buying the put and exercising
the option. This implies that if a situation of this kind were to arise, simple ar-
bitrage could yield a profit by selling the option and taking the opposite position
in the stock today (the direction would depend on whether it is a call or a put).
This is excluded based on the no-arbitrage assumption or would disappear from
the market place as a result of arbitrage activity:
𝑐 ≤ 𝑆0 , 𝐶 ≤ 𝑆0 , 𝑝 ≤ 𝐾, and 𝑃 ≤ 𝐾 (2.10)
An additional boundary exists for European puts; these cannot be worth more
than the discounted strike price as this is the only time the option can be exer-
cised. Given that the American put can be exercised at any point until then, the
argument does not apply to an American put:
𝑝 ≤ 𝐾𝑒−𝑟𝑇 (2.11)
Lower boundaries also exist. While Hull (2008) uses an arbitrage argument,
there is an alternative point to make: consider a long (short) future, an agreement
to buy (sell) an asset at a future point in time for price 𝐾 given the current
price of 𝑆0 . Equation 2.5 showed that the price of such a futures contract is
𝑓𝑙 = 𝑆0 − 𝐾𝑒−𝑟𝑇 (𝑓𝑠 = −𝑓𝑙 , respectively). Since an otherwise equal option can
result in the same purchase (sale) but with the additional value to the option
holder of being able to chose whether to exercise or not, the option cannot be
cheaper than the corresponding futures value. The option also cannot be cheaper
than being free. In the absence of dividends (note that equation 2.5 was used, the
no-income case) the lower boundaries are thus:
and since the American option can be exercised immediately, it also follows that
𝑃 ≥ 𝐾 − 𝑆0 (2.13)
ones:
𝐶≥𝑐 and 𝑃 ≥𝑝 (2.14)
𝑐 − 𝑝 = 𝑆0 − 𝐾𝑒−𝑟𝑇 (2.15)
𝑐 − 𝑝 + 𝐷 = 𝑆0 − 𝐾𝑒−𝑟𝑇 (2.16)
𝑆0 − 𝐷 − 𝐾 ≤ 𝐶 − 𝑃 ≤ 𝑆0 − 𝐾𝑒−𝑟𝑇 (2.17)
• historical volatility,
• stochastic volatility,
These will be summarised in this and the following sections excluding, however,
non-parametric models, which will be discussed in section 2.5.2.
• Which time series should be used and how is volatility to be calculated, i.e.
what formula to use?
The first two questions are somewhat related given that a sufficient sample
size is required to achieve meaningful estimates, i.e. to minimise the impact of
noise. Treating the two independently, however, the question of the time frame is
common to most approaches and a clear answer is not given in the literature. The
choice appears to be a function of the specific and implicit assumptions made by
researchers of what constitutes a sufficient data set.
The common goal is to choose a time frame that reflects the current or future
volatility level and not to include observations so old as to be no longer relevant to
the security, such as would result from corporate restructurings leading to changes
in the risk-level, for example. This question is significant for option pricing as
options available for trading vary in time to maturity. If a functional dependency
between the observation time frame and the forecasting horizon is assumed or
suspected, a single time frame is clearly not feasible.
Similar to choices of the time frame, the choice of frequency appears to depend
on the personal preferences of researchers in addition to the objectives of any par-
ticular study. A clear preference for any one choice does not emerge from the liter-
ature. This is in part due to the conflicting evidence from theoretical research on
the one hand and practitioners’ experiences. Poon and Granger (2003) comment
on this pointing out that “[i]n general, volatility forecast accuracy improves as
data sampling frequency increases relative to forecast horizon [(Andersen, Boller-
slev, and Lange, 1999)].”
It is noteworthy, as discussed by the same authors, that Figlewski (1997) found
that long-term forecasts benefit from aggregation, i.e. from lower sampling fre-
quencies. This in turn is contrary to the theoretical discussion by Drost and
Nijman (1993), who show that aggregation should preserve the features of the
volatility process. Poon and Granger (2003) stress, however, that “it is well known
that this is not the case in practice […and that t]his further complicates any at-
tempt to generalize volatility patterns and forecasting results.” In light of these
findings and the conclusions by these widely-cited authors, the question arises
if any results from previous literature are applicable to a specific problem or if
they are rather of interest with respect to the process they follow. The estimation
and evaluation methodologies rather than the resulting findings are the likely
contribution of volatility modelling research.
The above discussions regarding aggregation, sampling frequency and time
frame are of some importance to the historical volatility models as well as to
the stochastic volatility models. They both typically assume a single observation,
2.3 Volatility Models 27
i.e. a single price or return, on which to base the volatility estimate. This single
variable approach is only one of the choices, however. It is commonly applied
to the closing price of a day, week, or month. The volatility is calculated as the
standard deviation of (log-)returns over the sampling period.
Alternatively, the trading ranges can be used, a technique used more frequently
prior to the availability of high-frequency data. Parkinson (1980) proposed a
volatility measure based on the high and low observation instead of the closing
price, assuming prior aggregation. Garman and Klass (1980) on the other hand
propose a combination of the trading range and the close where the range is used
as the current observation and the change in closing prices as the reference point.
All these suffer from various problems, however, and none appear to be used
frequently. For a discussion and related literature see Haug (2007).
for the various stylised facts, e.g. Heston (1993), Cox (1975) for the constant
elasticity of variance (CEV) model, or Hagan et al. (2002) for the stochastic
alpha beta rho (SABR) model. More commonly, researchers follow the traditional
approach of a combination of moving averages and autoregressive components.
The ARCH model only uses the latter, the generalised autoregressive conditional
heteroscedasticity (GARCH) uses both components (Bollerslev, 1986), here using
the notation by Poon and Granger (2003):
𝑟𝑡 = 𝜇 + 𝜖𝑡 (2.18)
𝜖𝑡 = ℎ𝑡 𝑠𝑡 (2.19)
𝑞 𝑝
ℎ𝑡 = 𝜔 + 𝛼𝑘 𝜖2𝑡−𝑘 + 𝛽𝑗 ℎ𝑡−𝑗 (2.20)
𝑘=1 𝑗=1
ARCH(𝑞) only uses the first two terms of ℎ𝑡 in equation 2.20 while GARCH(𝑝, 𝑞)
follows the full specification as above. Further ARCH-family models exist to re-
flect particular features such as asymmetric responses to shocks.
The comprehensive review by Poon and Granger (2003), which is also discussed
by Poon and Granger (2007), revealed that implied volatility is usually better than
models from the ARCH-family (referred to there as GARCH) but the difference
is not as clear in the case of historical volatility compared with implied volatility
and even less so when comparing historical and ARCH-family models concluding
“as a rule of thumb, historical volatility methods work equally well compared
with more sophisticated ARCH class and SV models” (Poon and Granger, 2003).
The authors stress that it is critical to understand the objective of the volatil-
ity model and choose selection criteria accordingly. They also emphasised that
different models may work better or worse depending on the asset being studied.
Finally, the review reveals that the GARCH(1, 1) model is among the most com-
mon in the ARCH family (where parameters are given at all), a point also made
across the wider literature.
where 𝑁 is the number of trading days (weeks, months, periods) per year when
sampling at daily (weekly, monthly, any particular) frequency and 𝜏 the length
of a sample as a fractional year 𝜏 = 𝑁1 (Hull, 2008; Haug, 2007).
In addition to frequency adjustments, additional corrections may have to be
made to arrive at an accurate volatility estimate. These additional adjustments
are model-specific in that they are meant to correct for particular assumptions
of the option pricing model rather than for the volatility as such. It is therefore
important to be clear about the purpose of the volatility model. This is true in
more general terms as Poon and Granger (2007) explain in detail.
The issue of dividends in option pricing has been discussed before. As is pointed
out above, one possibility is to separate the problem into two component, a riskless
and a risky one. The volatility needs to be adjusted in this case to correct for
the additional cash flow. In the case of such a presence of dividends, the common
practice is that the BSM model requires an adjustment to the volatility estimate
by a factor of
𝑆0
(2.22)
𝑆0 − 𝐷
where 𝐷 represents the present value of dividends (Hull, 2008, p. 298, footnote
12) and 𝑆0 the price of the underlying. This is the necessary adjustment in the
presence of cash dividends.
The adjustment is needed as the dividend represents an additional benefit to
the owner of the stock (see the discussion at the beginning of the chapter). The
treatment of stock dividends is different as they do not represent additional value
but instead add to the number of shares held but not (if done proportionately) to
the fractional share of ownership. Depending on the nature of the issue and their
source, different adjustments may have to be made to the option price. These
are typically excluded from discussions of option pricing and indeed from their
implementation (including various models in MATLAB). This literature review
like the thesis as a whole do not address the question of stock dividends – or
warrants for the same reason.
30 Chapter 2 Literature Review
6
A particular model proposed by the same author and others is also included but it too re-
quires a numerical evaluation in at least some cases despite not being as expensive as traditional
models.
2.3 Volatility Models 31
The time-varying nature of volatility was one of the motivating factors for the
development of ARCH-models. It is for the same reason that a single estimate
cannot be applied to options with varying time to expiry (assuming a constant
strike price). The volatility to be used for pricing needs to match that of the
relevant period. One possibility is to develop a sufficient number of forecasting
models for the various possible forward periods but this is likely infeasible. In the
case of ARCH models, Poon and Granger (2003) suggests an iterative procedure
to arrive at additional forecasts.
In general terms, a way of converting between different forward periods would
be desirable. The relationship between the time to expiry and the volatility is
referred to as the volatility term structure. Recognising this, Haug and Haug
(1996) (as cited in Haug, 2007) show that the implied forward volatility reflects
the informational content of the term structure embedded in the base volatilities.
The corresponding model according to Haug (2007) is:
𝜎22 𝑇2 − 𝜎12 𝑇1
𝜎𝐹 = (2.23)
𝑇2 − 𝑇 1
𝑇1
𝜎2 = 𝜎1 (2.24)
𝑇2
under the usual assumptions and suitable conditions, in particular that 𝑇2 > 𝑇1 .
Since it is only the lower boundary it does not necessarily represent the best
estimate but a useful starting point if only one volatility is given but another
needed (over a longer time frame but fully covering the first).
Another option is to use the local volatility. Gatheral (2006) shows that the
implied volatility can then7 be found based on stochastic volatility models and
notes
A similar problem exists in the case of varying strike prices. Here too, a func-
tional relationship is needed between a single volatility estimate and the volatil-
ities to be used for pricing. The latter need to reflect the misspecification of the
pricing model and are thus model-specific. In the case of the Black-Scholes model,
non-normality is one such significant deviation and its implications are discussed
by Hull (2008). This gives rise to the well-known volatility smile. Explicit mod-
elling appears relatively rarely in the literature; Gatheral (2004) proposes one
such model with four parameters in addition to a volatility forecast for a given
time to expiry.
The solution discussed by Hull (2008) is to use an adjustment table. One would
start with a single forecast and find the appropriate volatility for a given time to
expiry and strike level. Formally, one needs:
The table of adjustment factors in Hull (2008) is one such way and may be more
broadly applied once it is stated in terms of the discounted strike or forward price.
In either case a general relationship is assumed to hold over time.
One possible solution is to infer the functional form from other instruments.
This is exactly the previously discussed implied tree method. Here a local volatil-
ity is inferred, local as it is specific to a particular state or node in the tree 𝜎𝐿 (𝑆, 𝑡).
Instead of inferring volatility from single estimate to the volatility surface, the
surface is inferred from existing prices in related (vanilla) options and applied to
the instruments to be priced (exotic options). This process is discussed in great
detail by Gatheral (2006). The approach requires an independent security, how-
ever, as the implied volatility from one security cannot be used for pricing of the
same security8 . The local volatility approach and the related models are thus not
particularly useful for the original research questions. Nonetheless, they illustrate
8
It is possible to use the implied volatility but the result would be the same price as observed
in the previous transaction adjusted only by Δ and the result of passage of time. If every market
participant acted this way, no additional information would be reflected in the prices. Additional
information needs to be added through the trading activity itself or through parameter updates
by some participants, however.
2.4 Machine Learning 33
the usefulness of such an approach for specific securities. It also demonstrates the
fairly large number of parameters needed even in the case of analytical models.
Given that the Heston model parameters, in particular, move slowly over time
(Gatheral, 2006), the model may in principle be useful even when the condition
of independent securities is not met. A related approach is the combination of
modelling the volatility smile and fitting it across expiration times using splines
as discussed by Gatheral (2006).
Finally, some researchers attempted to model volatility through functional
forms of varying complexity, in particular with attempts at introducing inter-
action terms. While no standard model has emerged so far, a frequently cited
set of functions is the one by Dumas, Fleming, and Whaley (1996), and Peña,
Rubio, and Serna (1999). The author introduced a number of functions including
quadratic terms to model the curvature of the smile.
What is common to all these models is that the number of parameters is fairly
large and the fitting procedure is not trivial. Gatheral’s concern about the ability
to find a fitted model is of particular significance as it highlights the importance
of the problem as well as the practical limitations. In an environment such as
this, non-parametric models may be more suitable as neither the functional form
nor the parameters need to be determined. Instead, both elements are inferred
from the data. This problem like the earlier ones may thus benefit from machine
learning techniques.
The focus of the thesis is exclusively on how option pricing decisions can be
aided using such techniques, which modelling approaches are suitable, and how
such models are built and fitted to improve results.
Common to all machine learning techniques is their empirical nature. Rather
than making assumptions as is frequently done in academic research, and infer-
ring particular models from them, machine learning focuses on observed data
in various forms and the models are designed so that they maximise the use of
existing evidence for future decisions.
The models can be classified (e.g. Vanstone and Hahn, 2010) by operating
principle, e.g. expert systems, case-based reasoning, swarm intelligence models,
ANNs, or meta-learners such as genetic algorithms. Alternatively, they can be or-
ganised by problem type, e.g. classification and time-series analysis. The former
requires assigning distinct labels presented to the model, the latter are a special
case of regression problems, not unlike those studied in statistics. Both problem
types are of some significance in option pricing. One possible application of clas-
sification is the decision of whether to exercise an (American) option early. This
would require a predictive classification model. Time-series analysis is the central
topic of this thesis and will be discussed in more detail in this and the following
sections.
A large number of learning models exist for regression problems (see, for ex-
ample, Hastie, Tibshirani, and Friedman, 2009). Most, though not all, are also
applicable to the special case of time-series analysis. Among the most frequently
used in financial research are ANNs. This class of techniques has a number of
benefits, while not unique are fairly rare, and will be discussed at the end of this
section.
As the name implies Artificial Neural Networks are an attempt at replicating
a mechanism observed in nature. They were inspired by the structure and func-
tioning of the brain, including that of humans. In a substantially simplified view,
the brain resembles a very large network with their nodes (neurons, i.e. cells)
interconnected with each other through links or edges (synapses in the biological
context). Through chemical and electrical processing, signals are passed from one
node to another. The signalling results and is in turn dependent on the excitement
level. Through the complex interaction of these signals, thought and behaviour
emerge in the individual.
Replicating a human brain, apart from ethical and other philosophical issues,
is currently not possible even at a basic technological level due to its size and
2.4 Machine Learning 35
complexity, i.e. the number of nodes and the high degree of interconnectedness.
Simpler networks can be built, however, and several models have been proposed
in the literature. By far the most common, and in many ways simplest, is the
single-layer perceptron (SLP).
It consists of an input layer, a hidden layer, and an output layer. The inde-
pendent variables represent the input vector and each value is provided to the
network at one of the input layer nodes, each node consequently represents an
explanatory variable. The desired output is determined as the value of the output
layer nodes. In the case of regression, only one such node exists, typically. The
hidden layer represents internal states. As the name implies, the values are not
typically available for inspection nor, as will be discussed further, can they be
interpreted in any meaningful way.
As an extension multiple hidden layers are possible, the general case of the
multi-layer perceptron (MLP). Regardless of the total number of layers, each
node is connected to all nodes in the previous and all in the following layer. This
implies the absence of loops and is not true for all other network types.
The perceptron is a supervised learning technique. In order to build a model
for the set problem, a number of samples need to be given to the network in
learning mode. The inputs and outputs are provided and the network is trained.
During training the network is evaluated. The initial values are used and weights,
which are attached to the edges using randomised initial values, are applied to
them. Furthermore, an activation function, which is often of the sigmoidal type,
is applied to the combined value. This in turn is passed to the next layer.
In the output layer, the identity function is typically used, i.e. the weighted
average represents the output value. During learning, the resulting value is com-
pared to the provided output, the target value, and an error is calculated. This
error is then used to modify the weights in the opposite direction throughout
the network. This process is repeated for each sample, and possibly for several
epochs. The resulting weights reflect a non-linear combination of the inputs of
arbitrary complexity. This is only limited by the architecture of the network (for
a formal discussion see Hastie, Tibshirani, and Friedman, 2009).
Networks of this particular kind are also called feedforward-backpropagation
networks since evaluation is done in only one direction (forward) and weights
are modified by a traversing the network in reverse direction (backpropagation).
Alternative algorithms with better convergence characteristics exist as well.
36 Chapter 2 Literature Review
Alternative architectures allow for loops in the network design or apply less
strict structural limitations on the architecture. A common deviation is to sub-
stitute the sigmoidal transfer function with a radial basis.
ANNs have a number of benefits but users also face a number of problems
in developing models and applying them successfully in a practice. Their prin-
cipal advantage is that they are ‘universal approximators.’ The models can fit
any continuous function, including those with non-linear features. This finding
was proven by Cybenko (1989) (see also Irie, 1988; White, 1988; Funahashi,
1989). The author limited the discussion to particular features, a limitation that
Hornik, Stinchcombe, and White (1989) showed is not necessary. Instead, the
universal approximator property is shown to result from the general architecture
and training process. These proofs are highly significant as they show that any
sufficiently large network will approximate a given function arbitrarily well. The
limitations stem only from architecture choices and data quality. Furthermore,
Hornik, Stinchcombe, and White (1990) showed that this does not only apply to
the function itself but also to its derivatives. Given the practical importance of
the option greeks, this insight explains some of the interest in the use of ANNs
in financial research.
The approximation properties make ANNs excellent predictors given sufficient
high-quality data. Despite these benefits, a number of disadvantages exist that
are a consequence of the great flexibility of the technique. Firstly, the models de-
rived from the data have no explanatory power. It is not possible to attribute the
prediction quality to any particular attribute or set of attributes. In fact, it is not
even clear if any particular attribute contributes anything to the final outcome,
i.e. to the classification or regression output. Unlike standard regression mod-
els, which allow for statistical tests of the significance of individual parameters
(variable weights), no such mechanism exists in the case of neural networks. The
resulting model is therefore a black-box. The inability to understand the inner
workings has traditionally contributed to significant scepticism and resulted in
their use largely in environments of high competition and an absence of competing
models. Over time, some methods have been developed for the removal of irrel-
evant inputs and increasing explanatory power, largely through rule extraction
(e.g. Baesens et al., 2003, for financial applications).
Secondly, the proofs by the various authors only demonstrated that the network
can approximate any particular function but no standard architecture or process
for choosing an architecture exists. The choice of input variables, the number and
2.5 Financial Applications of Machine Learning 37
size of hidden layers, the choice of learning parameters, pre- and post-processing
steps are still left to individual users with little guidance other than by reviewing
successful (rarely failed) implementations in related areas of research or use. This
is a particular problem for the choice of architecture and the learning process. A
similar problem exists in regards to the chosen input variables but since this is
domain-specific, the selection of variables needs to be made for any approach and
is thus not specific to ANNs.
Significant problems arise especially from the sensitivity of the model weights
to the initial weights. The nonconvexity of the error function results in multiple
minima and thus in the risk of getting trapped in one of them during learning.
One suggestion (e.g. Hastie, Tibshirani, and Friedman, 2009) is to train networks
multiple times with different initial weights. This is a common approach in such
situations. Vanstone (2005) does not solve the problem explicitly but combines
the issue with the question of the size of the hidden layer. The author starts
√
at ⌈ 𝑛 ⌉ hidden nodes where 𝑛 is the number of input variables, repeating the
process with increasing hidden layer size and stopping when the utility fails to
increase further. It is important to note that the author splits the error measure
in two components, one for training the network and one for model selection. The
stopping is subject to the latter constraint, which in the case of that particular
research is the benefit of the network for trading.
The alternative and textbook approach is to be “guided by background knowl-
edge and experimentation” (Hastie, Tibshirani, and Friedman, 2009) although
the authors provide some basic advice of using between 5 and 100 nodes. Based
on the work by Kolmogorov (1957, among others), which proved the existence of
a universal approximator, a maximum of 2𝑛 + 1 nodes are needed for the hidden
layer (Vanstone, 2005).
also reporting variable and parameter choices. A unified notation is used in the
presentation where possible, which is based on the previous sections rather than
using the symbols as they appear in the various publications. Dates are given
in international format and network specifications are given as 𝑥–𝑦–𝑧 with 𝑥 in-
put nodes, 𝑦 hidden nodes, and 𝑧 output nodes. Training methods, parameters,
and even input variables are frequently missing in the literature but are included
where available .
The review is organised as follows: Early studies are presented first in largely
chronological order as these set the standard for the modelling and statistical
evaluation of option pricing models outside the classical financial literature. This
is followed by a discussion of various modelling approaches and a separate subsec-
tion for exotic options. Finally, Australian studies and research focused primarily
on methodological advances is summarised. Research on option pricing with a
significant focus on the underlying volatility measure is discussed in the section
on volatility forecasting after this section.
Research into machine learning techniques and option pricing and to an even
greater extent into volatility forecasting does not appear to follow a clear path.
Consequently, most articles are combinations of methodological changes, insights
into particular markets, and technological outcomes. They are organised here by
their major contribution from the perspective of this thesis.
Among the earliest uses of ANNs are those by Malliaris and Salchenberger (1993a)
(see also Malliaris and Salchenberger, 1993b). They find that neural networks
can indeed be used for option pricing. They test this on S&P 100 index options,
using daily observations between 1990–01–01 and 1990–06–30, the 3-month Trea-
sury bill for the risk-free rate, and the ATM implied volatility. In addition to the
usual five parameters of the pricing model, the lagged index value and the lagged
option price are used. Networks with varying sizes are trained for five points in
time with a minimum history of 30 days and a testing time frame of two weeks.
Their rationale is that this allows the capturing of the volatility dynamics. Their
results suggest that there are pricing biases for both approaches but that the
ANN produces generally lower pricing errors. They also note that the results are
sensitive to choices of model architecture and parameters having used a learning
rate of 0.9 and momentum of 0.6 with a sigmoidal activation function. The au-
2.5 Financial Applications of Machine Learning 39
thors note that combinations and in particular the use of the ANN to a pricing
function in a next step may be beneficial. An important point made is that they
“would not expect to achieve results that are significantly different than those of
Black-Scholes if many traders are using the Black-Scholes model and the market
prices reflect their strategies.”
Hutchinson, Lo, and Poggio (1994) apply three nonparametric pricing models,
projection pursuit, the radial-basis function network, and the MLP network, to
option pricing and hedging (see also Hutchinson, 1994). The MLP training is
done in on-line, rather than batch mode, and gradient descent. As discussed be-
fore, price-homogeneity is an important property and by exploiting this property
and assuming constant volatility and the risk-free rate, the authors reduce the
functional form to only two parameters 𝑆𝑡 /𝐾 and 𝑇 to estimate 𝑐/𝐾. Using 𝑅2 ,
the average hedging error (AHE), i.e. the present value of the absolute devia-
tions of a replicating portfolio, and a combined measure of mean and variance
analogous to the definition of the mean squared error of a predictor in statistics
𝜈, they find that the techniques work well for synthetic data, where the MLP is
chosen to have four hidden nodes. Furthermore, they tested the same approach
using real data, S&P 500 index call options between 1987–01 and 1991-12, fit-
ting the models on each of the first nine of ten sub-periods, and testing them in
the subsequent one. Volatility is based on the 60-day historical estimate and the
3-month Treasury bill is used again for the risk-free rate. Here too, they find that
the models outperform the Black-Scholes (BS) formula for pricing and hedging.
An important concern identified is that additional predictors and statistical tests
are needed for future research.
An early deviation from using call options was the working paper by Kelly
(1994), who investigated the pricing of American put options on four major (by
volume) US companies’ common stock. The study period was fairly short with
only 1369 observations between 1993–10–01 and 1994–04–13. Using a one-year
historical volatility estimate (but reporting similar results for alternative time-
frames) and the 3-month Treasury bill for the risk-free rate, the ANN explains
99.6 % of the variability and thus much higher than the competing CRR model.
The author also demonstrates that the ANN can be used for hedging.
Barucci, Cherubini, and Landi (1996) expand on their earlier work (Barucci,
Cherubini, and Landi, 1995) on the use of ANNs to approximate partial differ-
ential equations under the no-arbitrage conditions using the Galerkin technique
(see original articles for details). The demonstrate how to allow for stochastic
40 Chapter 2 Literature Review
volatility and in particular for the modelling of the volatility smile. They also
note, based on analytical considerations, that “the approximating solution […]
can be looked at as a NN with one hidden layer augmented by a linear term,
i.e. a direct input output connection.” This is similar to what was observed ear-
lier empirically by Malliaris and Salchenberger (1993b). Their approximation is
𝑁+1
𝑐(𝑆, 𝑇 ) = 𝑆 − ∑𝑖=0 𝑤𝑖 (𝑇 )Φ𝑖 (𝑆), the latter term representing the weighted trial
functions introduced by the authors. The article further demonstrates the ap-
proach using an example “inspired to the S&P 500 options market” but the au-
thors stress that no attempt at learning the critical parameters for stochastic
volatility from market data had been made.
Herrmann and Narr (1997) further explore the non-parametric pricing models
in the German market and using intraday data. They use put and call options on
the Deutscher Aktienindex (DAX) (European-style German stock index options
traded at the Deutsche Termin Börse) between 1995–01–01 and 1995–12–31 (with
certain observations excluded, see the paper for details). The Frankfurt Interbank
Offer rates were used for the risk-free rate and the implied volatility index for
the DAX (VDAX). Like Hutchinson, Lo, and Poggio (1994), the authors used
a synthetic data set but only to determine the suitability and MLP network
complexity required. They confirm prior research that ANNs are able to price
options better (more accurately) than the BS formula and are able to implicitly
model the derivatives with some notable deviations from the closed form with
regard to Ρ and V, they speculate that this may be due to correlations between
the variables.
Qi and Maddala (1996) extend earlier work on pricing S&P 500 index options
by including open interest as an input variable. They use only a short period of
time (1994–12–01 to 1995–01–19) of daily data, the three month treasury bill as
a proxy for the risk-free rate and the volatility over the past 106 days. Instead
of a simple split, they apply five-fold cross-validation. They train feedforward
networks with a single hidden layer using five input variables (excluding volatility
but including open interest for network training) by first experimenting with
various network sizes concluding five nodes to be appropriate. The data is first
normalised, the learning rate set to 0.05 and momentum to 0.95. Learning is
stopped after 15 000 epochs with additional constraints to weight updates. They
report small improvements over the BS formula. By analysing the actual weights
in the network, the authors argue that the network reflects the typical economic
relationships and that open interest is an “important factor” in pricing. They also
2.5 Financial Applications of Machine Learning 41
9
The author retains the network complexity, i.e. the number of hidden nodes, despite in-
creasing the potential model complexity for comparison reasons.
42 Chapter 2 Literature Review
addressed this type of option [,with future-style margining, and that t]he studies
that have pertained to this issue disagree as to whether or not a risk-free rate
should be included in the option pricing model.” The author points out that the
agreement is that if payment is required by both, the rate should be dropped.
Another study on LIFFE-traded options is presented by Raberto et al. (2000).
They introduce as input |𝑆−𝐾|𝑇
to model the smile in addition to the usual 𝐾 𝑆
and 𝑇 showing graphically a good match to observed prices after training the
estimator on synthetic data.
With assumptions similar to those in the original article by Hutchinson, Lo,
and Poggio (1994), Yao, Li, and Tan (2000) limit input variables to moneyness,
time-to-expiry and the risk-free rate. Thus volatility is strictly implied and as-
sumed constant for the training, validation, and test set. Various network sizes
are tested starting at half the number of inputs with a step-wise increment of
1. Using Nikkei 225 index call options trading at the Singapore International
Monetary Exchange between 1995–01–04 and 1995–12–29, they find as did pre-
vious researchers that the ANNs can outperform the BS formula in particular for
non-ATM options. They also find that time-indexing improves performance (with
respect to normalised mean squared error (NMSE)) but suggest that volatility
does not need to be modelled separately.
Healy et al. (2002) study LIFFE FTSE 100 index futures options between 1992
and 1997. They deliberately do not assume homogeneity and find that the bid-ask
spread as a proxy for transaction cost and open interest have explanatory power
while volume does not. In particular, the bid-ask spread has greater explanatory
power than the risk-free rate, using the 90-day Libor rate. They use implied
volatility and eleven hidden nodes for the network but despite the use of market
rates and volatility conclude that performance is not constant over time thus
raising the question of how long a model derived from market prices is useful.
Adding to the international evidence, Amilon (2003) tests the usefulness of
ANNs in the Swedish stock index option market, focusing on calls between
1997-06 and 1999-03 excluding April and May each year due to concerns regarding
dividend-adjustments made to the index during that period. The risk-free rate is
the 90-day local treasury bill. In addition to the usual parameters, time-to-expiry
is measured in trading and calendar dates separately. As mentioned above, var-
ious adjustments to closed-form pricing models exist, including corrections for
differing day-counts between the risky asset and the risk-free one. The adjust-
ment is, however, made to the rate itself rather than an additional parameter
2.5 Financial Applications of Machine Learning 43
directly. Homogeneity is assumed and the bid and ask prices are estimated sep-
arately. Furthermore, five lagged index values are used as inputs instead of only
the current one as well as two historical volatility estimates based on 10 and 30
past returns respectively. The hyperbolic tangent is used for the hidden layer ac-
tivation functions, and the logistic function for the output layer. Another unusual
choice is regarding the partitioning. For each year the first four months are used
for training, the following two for validation and one more month for testing, this
is repeated by moving forward by a month but keeping the starting point of the
training period the same thus expanding the training set. This ultimately yields
non-overlapping testing sets, however.
Carverhill and Cheuk (2003) studies S&P 500 index futures options (calls and
puts). Departing from prior studies, homogeneity is assumed but the ratio re-
versed (𝐾/𝑆), the risk-free rate interpolated using market data (Libor and Eu-
rodollar futures) and several networks trained using weighted average past data
with sampling at weekly frequency to avoid day-of-week effects. The modelling is
approached by separating the pricing from the estimation of the derivatives, the
greeks. Three hidden nodes are used in either case but the network estimating the
greeks, has two outputs, Δ and V. The results show improvements with respect
to the volatility in hedging, i.e. better hedging, than the reference model (CRR).
10
This approach is suitable in the risk-neutral valuation framework.
44 Chapter 2 Literature Review
in the near future. Their results show that the networks are better than both
the traditional Black-Scholes formula as well as one adjusted for higher moments
(especially skewness and kurtosis). This holds for pricing and hedging errors.
A similar approach is taken by Healy et al. (2007). Studying LIFFE FTSE 100
put options (European and American), the authors infer the risk-neutral densities.
Using a 5–11–1 architecture option prices were estimated permitting differentia-
tion to arrive at the densities. The results are found to be unbiased with respect
to the price at expiration but biased with respect to realised volatility.
Zapart (2002) approaches the problem not by estimating the risk-neutral densities
but rather the evolution of prices. Using wavelets and a neural network, the author
models local volatility over time, which is then used in a binomial tree. Studying
the options on three United States of America (US) companies, hedging risk
is determined and shown to be lower for the combined wavelet-ANN model (five
hidden nodes) than the BS formula between 2001–08 and 2002–01. Zapart (2003a)
extends this data set to 2002–07 and 53 companies again in the US demonstrating
how such a network can be used for trading, in this particular case to creating
delta-hedged portfolios with a single threshold parameter. Finally, Zapart (2003b)
extends the original approach (Zapart, 2002) by using genetic algorithms to find a
suitable network, a particularly time-consuming task considering that each time
series is to be modelled by a separate network (not unlike GARCH volatility
modelling).
Montagna et al. (2003) use a synthetic data set of European and American
options. The authors study a path-integral algorithm for pricing first and apply
radial basis function (RBF) networks to the prices derived using this algorithm.
The authors find that both approaches can be very useful for pricing options and
that the ANN results depend “as expected” on the choice of the spread (training)
parameter and the number of observations used for training. They generally find
a very low deviation of the ANN-based results from the reference model. Morelli
et al. (2004) extended this research focusing on the differences between the RBF
and MLP models with respect to pricing and the estimation of option greeks.
They find that the RBF approach is much faster and thus more suitable for
“preliminary checks” while the MLP performs better in the long run but is more
2.5 Financial Applications of Machine Learning 45
the years 2001–2003. The data consisted of the usual stock and option data as
well as the “market to market,” high and low prices. The authors applied cross-
validation though it is not clear if a separate out-of-sample set was used or what
option terms were, and whether the volatility parameter was implied volatility.
Focusing on the technical aspects the authors found that a circular committee
approach leads to the best performance with respect to the error metric while the
best RBF and MLP networks performed relatively well, too. Importantly, they
note that the performance of the MLPs tends to be more stable across models.
Given the difficulties of finding a well-performing network, this insight is partic-
ularly relevant to ANN-related research in option pricing as local minima and
highly sensitive results are generally of little value.
Using the same data set and limiting investigation to the futures and options
of the All Share Index Pires (2005) (see also Pires and Marwala, 2004; Pires and
Marwala, 2005) compared Baysian MLPs and support vector machines (SVMs),
and comparing the Baysian approach to the maximum likelihood method. Based
on the ME and the maximum error, the author concludes that the SVMs outper-
form the MLPs regardless of the method and the differences can be very large.
Promising results were also reported by Hamid and Habib (2005) using S&P 500
index options between March 1983 and June 1995. Since the underlying were in-
dex futures, the Black (1976) model was used as a reference point and to extract
implied volatilities. The authors compare prices and implied volatilities for the
nearest call option at specified expiry dates with respect to the reference model
and report t-test, MAE, and root (of) mean squared error (RMSE) for the differ-
ent models. Unlike other studies, they find that ANN has some difficulties at
very short maturities. As they do not provide the futures volatility as an input
they speculate that the data may be too noisy to infer the function in that re-
gion for this data set. Note that many studies that do exclude data because of
fitting problems, do so at the very long maturities and typically find a good fit
for shorter periods.
Another comparison of models is given by Liang, Zhang, and Yang (2006). The
authors limit their investigation to four stocks of the Hong Kong market during
March through July 2005. The authors propose a hybrid model, which does not
use the usual option pricing inputs but rather four different pricing functions (bi-
nomial model, the BS formula, the finite difference, and the Monte-Carlo model).
The actual input is the difference between those functions and the mean pricing
result on a particular day. The resulting hybrid model is therefore an ensemble
2.5 Financial Applications of Machine Learning 47
Quek (2006) (for more detail see also Teddy, Lai, and Quek, 2008) proposes an
alternative model inspired – as the ANN itself – by the human brain. The proposed
model uses the weighted Gaussian neighbourhood output for activation. The new
model fails to outperform a 3–8–1 MLP model using the RMSE of the mispricing
of British Pound–US Dollar foreign exchange futures call options traded on the
Chicago Mercentile Exchange (CME). The authors stress, however, that the new
model allows for the extraction of discrete pricing rules, which they claim is very
difficult for traditional MLPs. Furthermore, they show how the model (without
comparison to alternatives) can produce risk-free portfolios from trading system
using mispriced options.
Kakati (2008) makes a similar point regarding an alternative model that has
at least some explanatory power. In this article an adaptive neuro-fuzzy system
(commonly abbreviated ANFIS) is proposed and compared to the BS model. In
line with the existing body of knowledge, the model outperforms the traditional
pricing model but also a simple ANN. This is demonstrated using 40 American-
style call options on seven Indian stocks. Volatility is estimated using GARCH
models, 60-day historical volatility as well as implied volatility. The author also
adopts the homogeneity hint.
The results of the study are interesting in several ways. Firstly, the ANN is
only rarely the best model, which is surprising given past evidence, and even
the BS model outperforms when used with implied volatility. Given how implied
volatility is derived, this may not be surprising. However, the ANFIS approach
generally outperforms the competing models under all volatility measures.
Quek, Pasquier, and Kumar (2008) also consider an alternative network struc-
ture, RBF using Monte-Carlo Evaluation Selection to determine the input set. In
addition to the usual steps, the authors also use the predicted values to compute
a theoretical portfolio returns series from a trading model (delta hedging). Of the
input variables tested, open, close, high, low and the previous two open prices are
reported to be relevant. Two instruments are investigated, gold and the British
Pound-US Dollar futures and options, covering the period of 2000–2002 (gold)
and 2002–10 to 2003–06 (currency, respectively). Unusually for option pricing, the
authors develop the networks to make a directional and price prediction forecast
using an MLP, an Elman recurrent network, a special fuzzy-neural network, and
the innovation model. The latter is a modified recurrent network with a modi-
fied learning algorithm. The Elman network performed better than the MLP; the
fuzzy-neural network had generally poor prediction ability. The methods were
2.5 Financial Applications of Machine Learning 49
chosen due to the limited amount of data and the authors conclude that their
proposed model outperforms the MLP and the Elman network leading to an ac-
curacy of up to 90 %. This work is further extended by Tung and Quek (2011),
who introduce an evolving fuzzy rule set and related trading system for trading
volatility based on the novel approach.
Hybrid Models
eters are given. Finally, they conclude that hedging profits depend on the level of
transaction-cost and suggest that some inefficiencies may still exist in the market.
Andreou, Charalambous, and Martzoukos (2010) extend prior work on hybrid
networks and the use of the deterministic volatility function (DVF) for reference
models to price European call options on the S&P 500 index between 2002–01
and 2004–08. Splitting the data into twelve months training, two for validation,
and one month for testing, with rolling partitions but non-overlapping test sets,
allows for continuous re-estimation of parameters (both the parametric reference
models and the non-parametric innovation models). The ANNs are further mod-
ified by adding a parametric pricing function as a fourth layer to a standard
single-hidden-layer perceptron so that “the network structure embeds knowledge
from the parametric model during estimation (thus resulting in a semi-parametric
option pricing method).” The resulting models, which use the dividend yield and
in the case of ANNs a dividend-adjusted moneyness measure, are shown to be
comparable to models of stochastic volatility with jumps with respect to the cho-
sen error measure (notably RMSE but also reporting MAE and MdAE). They do
not generally and unconditionally outperform those but do outperform the con-
ventional pricing formulas (BS and CS). The authors re-iterate the importance
of choosing functions for hedging based on their hedging results and note merely
their pricing performance, i.e. the performance metric needs to be aligned with
the intended use of the model. The specific enhancements within the model are
with respect to volatility, skewness, and kurtosis.
Departing from prior research into ANN use, Andreou, Charalambous, and
Martzoukos (2009) use SVM regression for option pricing and find positive re-
sults in the case of hybrid models, which use not average volatility but volatility
from a DVF. The data used consists of the typical observation with an inter-
polated risk-free rate and daily option prices (using bid-ask midpoints) between
2003–02 to 2004–08. The use of a DVF, however, is important since it allows for
contract-specific volatility estimates.
Charalambous and Martzoukos (2005) study the applicability of ANNs to two
option pricing problems using a hybrid approach similar to the one proposed
in the methodology section of this thesis. The authors use synthetic data sets
for financial and real options and build a simple 20-hidden-nodes ANN, as well
as a hybrid network using the numerical option pricing model as the baseline
model while one fits the difference. Studying the MSE, MAE, and the maximum
absolute error of the networks as well as the numerical pricing model as a reference
2.5 Financial Applications of Machine Learning 51
point, they conclude that the hybrid approach is superior, especially for the very
large network and is slightly better even for a very small 2-hidden-nodes network
compared to the reference model.
A similar approach is taken by Blynski and Faseruk (2006). Again the hybrid
model as the difference between the target value and the baseline model is devel-
oped. The authors use a wider range of performance metrics, however, 𝑅2 , ME,
MAE, mean absolute percentage error (MAPE), NMSE, and MSE. They split
the Chicago Board of Exchange OEX index call options data between 1986 and
June 1993 into 60 % training, 20 % validation, and 20 % testing data and removed
non-representative data. As previous authors, including the research discussed in
the next section, they find some support for the use of ANNs. A two-stage hy-
brid model, which estimates implied volatility and applies Lajbcygier’s hybrid
approach discussed below, fails to improve the outcome. At least for this market,
the researchers conclude that the market appears to be efficient with respect to
pricing such options. In particular they find that using the implied volatility, the
BS model is fairly good by comparison.12
A similar model is developed by Amornwattana, Enke, and Dagli (2007). Two
ANNs are trained one estimating the implied volatility, the second pricing the
difference between the observed price and the reference BS model. For compari-
son, a model using historical volatility instead of the first network is used, which
has a window size of 90 days. The model is tested on five stocks that are members
of the DJIA from 2002–07–01 to 2002–10–15, where the October data was used
for out-of-sample evaluation. Unusual for option pricing evaluation, the authors
calculated various non-parametric tests and scores using the MAE and MSE
series: the Wilcoxon test, median, Van der Waerden and Savage scores. These
demonstrate that the lower errors (compared to the reference model using BS
and historic volatility (HV), and the network learning the difference between the
observation and the BS value) are also statistically significant in many cases.
Saxena (2008) follows the standard methodology for hybrid models, using the
ANN to learn the additional price relative to the baseline BS model. They
test this on the S&P CNX Nifty index (India) options between 2005–11–01
and 2007–01–25, using a one year historical volatility and exploiting price-
homogeneity. Following the splitting of data into 40 % for training, 30 % for vali-
dation, and the remainder for testing, the usual statistics 𝑅2 , ME, MAE, MPE,
12
The authors do not discuss methodological issues arising from the use of implied volatility
derived from the same model to which it is applied in a later step, however.
52 Chapter 2 Literature Review
and MSE are calculated and the hybrid model outperforms the baseline model
with respect to each of them.
A different approach to hybrid models is the one proposed by Ko (2009). Instead
of combining the machine learning technique with an underlying pricing function,
a (neural) regression model is developed, which combines various input networks
in a single functional form similar to committee models. The input variables are
the BS inputs and the experiments were conducted using index future options
from the Taiwan Futures Exchange. The options are European-style and data
was collected for the period 2005–01–03 to 2006–12–31, of which 80 % was used
for training, the remainder for testing. A learning rate of 0.1 and two hidden
layers with six neurons each, was used for each network. The study concludes
that the model is superior to the BS model with respect to the average absolute
delta-hedging error.
Not all evidence is supportive of the use of ANN. Gradojevic and Kukolj (2011),
for example, concludes that a parametric model based on fuzzy rule-based system
is no worse than a non-parametric feedforward network using European-style
S&P 500 index call options for the period of 1987 to 1993 (by expiry date). No
model can be said to be better than the other using the Diebold-Mariano test
statistic. This is despite using a network with a hint similar to prior research,
which typically found hints to improve results and lead to improved performance.
Extending the use of ANNs to more complex options, Lu and Ohta (2003a) and
Lu and Ohta (2003b) generate a synthetic data set based on a number of as-
sumptions informed by NYSE options. The authors study the applicability of
machine learning to the pricing of complex power and rainbow options and mod-
ify the standard approach of training the networks using the contract inputs and
volatility parameters by supplementing the input data set with a pricing hint
based on digital contracts, i.e. pricing based on the binomial model. They show
using Monte-Carlo simulations that the hinting improves pricing performance
measured as the RMSE and stress that the approach can be generalised to a
variety of complex options.
Using LIFFE data as well, Xu et al. (2004) generate a synthetic data set of
barrier option data points based on the European-style actual observations. The
authors build two different models, who differ in the exclusion vs. inclusion of the
2.5 Financial Applications of Machine Learning 53
trading date as an input, which did not appear to lead to better results. They
find based on the 𝑅2 and paired t-tests that ANNs can be a valuable tool for
option pricing and that no additional steps had to be taken to address the barrier
feature of the options.
Another application outside the equity (index) options is presented by Leung,
Chen, and Mancha (2009). The authors compare MLP and general regression
neural network as well as a number of projection models and the BS formula.
These are applied to foreign exchange futurees options for the British Pound,
the Canadian Dollar and the Japanese Yen as traded on the CME covering the
period of 1990–01 to 2002–12 (in sample) and 2003–01 to 2006–12 (out-of-sample).
Subject to transaction cost, the general regression model is generally best. This
is also supported by pairwise comparisons of trading returns using the bootstrap
method. Furthermore, an ensemble (i.e. a committee) model combining the two
machine learning techniques shows a lower coefficient of variation of returns than
each model individually, which is not surprising givne past statistical literature.
This is, however, economically important as variability is risk in this context.
Chen and Sutcliffe (2012) (see also Chen and Sutcliffe, 2011) investigate the use
of hybrid ANNs and compare it to the model by Black (1976). The data set covers
NYSE LIFFE tick data for short sterling (British Pound) call and put options and
the relevant futures for the period of 2005–01–04 and 2006–12–29. The authors
test a simple model pricing the option and one pricing the difference between
the reference model and the observation separately. In addition, two different
strategies of calculating the hedging requirements are proposed, calculated based
on the priced option, or as the target variable of either of the networks. Paired
t-tests are used to compare the pricing differences. Based on the MSE, MAE, ME
of pricing and hedging, the authors conclude that ANNs can be used successfully
for interest rate options with the hybrid model strictly better, and the simple
network better or no worse than the traditional model.
Lajbcygier et al. (1996) focused on the use of ANNs for pricing American-style
index futures options (SPI futures options) in the Australian market. Using end-
of-day option prices between January 1993 and December 1994, the latest prior
intraday underlying price, the 90-day bank bill as a proxy for the risk-free rate
and a volatility estimate similar to Hutchinson, Lo, and Poggio (1994). In re-
54 Chapter 2 Literature Review
gards to the latter, the authors point out that the original article (Hutchinson,
Lo, and Poggio, 1994) contained an error, which, according to the authors, was
acknowledged in private communication but was not present in the actual imple-
mentation. The resulting networks assume price-homogeneity and are evaluated
using 𝑅2 , normalised root mean squared error (NRMSE), and MAPE. Similar to
Hutchinson, Lo, and Poggio (1994), the authors (dealing with only one underly-
ing) use two configurations, one with a reduced set of input variables (moneyness
and time to expiry), the other with the full specification of all four parameters.
The data set was further split into a training and test set (20 %), with the test set
formed randomly from all observations. The authors conclude that the neural net-
works can be beneficial in at least some cases, notably in the reduced region (10 %
around ATM and for expiry no longer than one fifth of a year) but they do not
show substantial improvements over the benchmark methods, the Black-Scholes
and the Barone-Adesi/Whaley formulas. They also note the surprising finding
that the Black-Scholes model fits the Australian data extremely well when com-
pared to the US case (Hutchinson, Lo, and Poggio, 1994) especially considering
that it is not directly applicable to American-style options.13
Lajbcygier and Connor (1997a) concentrate on small data set of intraday SPI
options in 1993 (see also Lajbcygier and Connor, 1997b). Using the first six
months for training with again 20 % of data reserved for cross-validation, the
authors train a 3–15–1 network similar to the networks before with two notable
innovations: Following their prior research (Lajbcygier and Flitman, 1996) they
train not a direct pricing network but hybrid model, which generates the price as
the difference between the modified BS formula and a trained ANN model. In do-
ing so they implement the suggestion by Malliaris and Salchenberger (1993a) and
a functional form common in forecasting problems (e.g. volatility forecasting as
pointed out by Poon and Granger, 2003) and not dissimilar from the one used by
Barucci, Cherubini, and Landi (1996). Furthermore, they use a weighted implied
volatility (IV) scheme to derive a volatility estimate acknowledging problems with
this approach in general terms (see the article for details or Poon and Granger,
2003, for a more general discussion). Finally, they use bootstrapping to infer
confidence intervals, and bootstrapping and bagging to address model bias.14 In
doing so they address earlier concerns by researchers given that the error terms
13
Note that early exercise is not meaningful for non-dividend paying securities in any case
as per the earlier discussion.
14
A detailed discussion of the theoretical foundations and implementation notes regarding
the two techniques can be found in Hastie, Tibshirani, and Friedman (2009).
2.5 Financial Applications of Machine Learning 55
of such pricing models often do not meet the assumptions of standard statistical
tests. The objective is to infer points of mispricing and thus profitable trades.
They conclude that bootstrap bias reduction is superior to bagging in the case of
option pricing and that the error appear greater near the boundaries, notably at
the very short maturities.
In a later paper (Lajbcygier, 2003a), the author re-examines the same data set
as in Lajbcygier et al. (1996) adding statistical tests to determine which models
are better than others using dependent t-tests on the test set, which again is a
random 20 % of the data in both the reduced and the full region. The indepen-
dent t-tests are used to compare the distribution (not the pair-wise comparison)
of the absolute errors between the reduced and full region in the test set to deter-
mine any differences between them. Their statistical findings support the earlier
evidence that ANN-based models lead to significantly improved pricing in the
economically important (restricted) region. In the full region only the largest
network appears competitive. Furthermore, the networks are not substantially
different from one another. The authors highlight again the surprisingly good
fit of the Black-Scholes formula compared to the Barone-Adesi/Whaley model
suggesting that the argument by Lieu (1990) holds in Australia.
Finally, Lajbcygier (2004) attempts to address pricing biases found in earlier
research (Lajbcygier and Connor, 1997a) near the boundary (see also Lajbcy-
gier, 2003b). Some of those were discussed earlier in section 2.2.5. In particular,
Lajbcygier (2004) modifies the learning algorithm such that the hybrid model
(Lajbcygier and Connor, 1997a) yields meaningful results as one approaches ex-
piry 𝑇 = 0, moneyness is close to 0, or volatility (again using the weighted implied
standard deviation) approaches 0. In all three cases, the neural network should
return 0 (recall that it represents a penalty over the closed-form modified Black
model). For the three years studied, 1993–1995, the authors find statistically sig-
nificant and much improved behaviour of the pricing function near the boundaries
as a result from the constraints. The author suggests further research studying
the use of bootstrap methods in combination with constraints, the combination
with alternative models, such as SV, and exogenous variables.
Methodological Advances
Anders, Korn, and Schmitt (1998) demonstrate how the network architecture
can be derived based on a two-step process of slowly increasing the number of
56 Chapter 2 Literature Review
nodes and then decreasing the connections based on statistical tests. They use
the DAX index call options (European-style) between of 1994 with all transac-
tions between 11:00 and 11:30 using minutely data. Furthermore, unusual ob-
servations (violating boundary conditions, or situations near the boundary) were
excluded. The risk-free rate is derived by interpolation of interbank lending rates
at various maturities. Two volatility estimates, 30-day HV, and IV (using the
VDAX), are investigated. The resulting networks (without closed-form model as
a hint), are not fully connected and have three hidden nodes. Once the BS result
is provided to the output node, the network collapses further leaving only money-
ness and time-to-expiry as additional variables.15 The authors conclude that the
statistical techniques used for configuration can help with the pricing accuracy
preserving the usual relationships between variables as suggested by theoretical
considerations (option greeks). Notably, the index level has predictive power when
combined with historical volatility.
A related problem is the dynamic structure of the network. Instead of deter-
mining the network structure at the time of development, the structure is fixed
but the parameters are allowed to change over time (non-stationarity). Ormoneit
(1999) shows that Kalman filters can be used to influence the weight updates
in ANNs and applies this to option pricing. The use of Kalman filters is not
new; Niranjan (1996), for example, demonstrated how they can be used to model
volatility and the risk-free rate. In contrast, Ormoneit allows for continuous up-
dates of network weights but imposes a penalty (regularisation) using Kalman
filters. The strategy is tested using DAX index call options between 1997–03 and
1997–12, all of which expiring at the end of the period. A constant risk-free rate
of 𝑙𝑛(1.05) is assumed and various fixed, previous implied, or historical volatility
of the futures contract are used in combination with the reference model, the
Black-Scholes formula. They find that the networks perform very well with re-
spect to the hedging error but not as well for pricing. The problem appeared to
be near expiry and the author attributes it to limited complexity of the model to
accommodate the boundary condition. Like Lajbcygier (2004), Ormoneit modifies
the network but in this case by switching to an activation function and related
evaluation along the network path that are consistent with the risk-neutral pric-
ing approach and reflect the boundary condition. This results in improved pricing
and hedging.
15
Note that this is identical to the standard specification of the volatility surface.
2.5 Financial Applications of Machine Learning 57
In another study regarding network design and its relationship to the option
pricing problem Galindo-Flores (1999) builds 13 920 candidates to determine the
characteristics of several machine learning techniques, classification and regres-
sion trees (CART) decision trees, feedforward ANN, 𝑘-nearest neighbour, and
ordinary least squares (OLS) regression, and their parameter choices. The author
suggests that it is difficult to decide a priori, which technique should be used and
how the models are to be constructed (i.e. parameter choices). In particular, the
neural networks used for option pricing use three input and one output node (as
in Hutchinson, Lo, and Poggio, 1994) with six and 18 hidden nodes in two models
generated for comparison. Focusing on structural risk (Vapnik, 1995) and using a
synthetic pricing data set under the usual assumptions of constant interest rate
and volatility, it is found that ANNs offer the best performance especially when
the sample size is larger, OLS being preferable when the sample size is restricted.
The ANN-specific finding is that the modified Newton method in combination
with the larger network (3–18–1), and a large number of iterations (450) is pre-
ferred. The author cautions against generalisation, however, as the data set is
simplified, ‘less nonlinear.’
Garcia and Gençay (2000) show how to exploit one of the assumptions normally
made in option pricing, and in pricing securities generally. As discussed before,
the pricing function is considered homogeneous with respect to the price and
strike of the underlying.16 Using European-style S&P 500 index call options (and
initially a synthetic data set) from 1987–01 to 1994–10, they demonstrate that
splitting the pricing function (homogeneity hint) into two components one driven
by moneyness 𝐾 𝑆
and one by time-to-maturity, leads to better out-of-sample pric-
ing. They train networks of varying complexity (up to 10 hidden nodes) in the
first half of each year, select the network based on the third quarter, and test the
results in the last quarter. Diebold-Mariano (DM) statistic and the mean squared
prediction error (MSPE) are used for pricing evaluation. They also point out that
the hedging performance needs to be considered in the network selection process
rather than being used simply after the best network with respect to pricing be
chosen.
An alternative methodological improvement is suggested by Gençay and Qi
(2001). The authors test the effect of regularisation techniques, in particular
Bayesian regularisation, early stopping, and bagging in the context of non-
16
Note that at least one study (Anders, Korn, and Schmitt, 1998) indicates that the price
level of the underlying (an index) has some limited predictive power.
58 Chapter 2 Literature Review
parametric option pricing. They use S&P 500 call options between 1988–01 and
1993–12. In addition to the pricing data, the three-month T-bill is used as a proxy
for the risk-free rate, and a three month moving sample of volatility is used. The
results are compared to a neural network without special considerations during
training. Each year is split into training set (first half), validation (third quarter),
and test set (fourth quarter). The authors stress the shortcomings of splitting the
data set in this way. To test whether improvements were made, the mean squared
prediction error and the average hedging error are presented along with a measure
combining mean and variance 𝜈 = 𝜇2 + 𝜎2 ; the authors also apply the DM test
statistic and the Wilcoxon signed rank (WS) test.
Gençay and Salih (2003) further investigates these results using S&P 500 in-
dex options between 1988–01 and 1993–10, adding also the Bayesian information
criterion (BIC) as an alternative to the regularisation techniques and studying
in particular, the relationships between input variables and mispricing between
ANN and BS models. They find that much of the bias found in the BS model is
eliminated by the use of ANN. They follow the partitioning scheme of the pre-
vious article discussed and use historical volatility matching time-to-expiry but
not less than 22 observations.
Dugas et al. (2001) Dugas et al. (also 2009) further show that in a process
similar to the two previous studies, alternative approximators can be used, which
benefit from the derivatives. While the approach is different from the ones seen
before (Lajbcygier, 2004; Choi et al., 2004; Ormoneit, 1999), it relies on a different
functional form and the neural network is used as a benchmark. They apply this
to S&P 500 European index call options between 1988 and 1993. They find that
the use of such knowledge is beneficial for forecasting, especially generalising, in
reducing the MSE.
Focusing on synthetic European call options data in the presence of dividends
(as a continuous yield), Le Roux and Du Toit (2001) confirm that the pricing
function can be learned but were unable to find an optimal network architecture
despite testing a fairly large range of designs, one or two hidden layers, and for
the single hidden layer between six and 24 hidden nodes.
An alternative to the use of Kalman filters above is given by Ghosn and
Bengio (2002). Instead of building a single network and allowing for parameter
drift in some way, a number of models are built simultaneously with somewhat
different parameters. Those parameters are however constrained and those con-
straints form the “domain-specific bias.” Using European call options related to
2.5 Financial Applications of Machine Learning 59
the S&P 500 index between 1987 and 1993, the multi-task method (i.e. the in-
novation model) almost always outperforms the traditional single network. The
models assume homogeneity and vary in terms of the time frame used for training.
Unrelated to the main question, the author points out, however, that very long
timeframes (1250 days) lead to poor performance. Instead a two-input network
is used subsequently, which is the solution Hutchinson, Lo, and Poggio (1994)
arrived at by assuming constant volatility over the whole period.
An alternative to the bootstrapping for inference (Lajbcygier and Connor,
1997a) is the three-phase process introduced by Healy et al. (2003) as well as
Healy et al. (2004). The authors demonstrate how, by training an additional net-
work on what is the validation data set for the primary one to be analysed to
infer confidence limits. They demonstrate its use on LIFFE FTSE 100 options
pricing using ANNs.
Huang and Wu (2006b), and Huang and Wu (2006a) (see also Huang, 2008)
compare a number of hybrid models but rather than combining machine learning
techniques and the reference model only, they combine filters and an additional
model to the problem, i.e. Monte-Carlo filters and Unscented Kalman filters,
respectively. Using the same data set, call and put options from the Taiwan
Futures Exchange between 2004–09–16 and 2005–06–14. they find that the hybrid
model using Monte-Carlo filters (and Unscented Kalman filters, respectively) and
SVMs outperform competing hybrid models, including a combination of an ANN
with the respective filter. Very little information is given on the fitting procedures,
however. The authors focused on minimising the RMSE.
Traditional models also underperformed relative to a ‘hyperparameterised’
model proposed by Jung, Kim, and Lee (2006). The model selects parameters
using a neural network kernel for pricing and implied volatility as the relevant
input parameter in addition to the usual set. The risk-free rate was not used as
it changed little during the study period. Testing the model on the Korea Stock
Exchange KOSPI 200 index for the year 2000, the model results in lower errors
than the competing MLPs with various learning algorithms as well as the BS
model and an RBF network. This was true in the training and test subset.
Another way of deriving the network configuration is through the use of evolu-
tionary algorithms. Wang (2006) tests this approach, where the network size and
connectedness is determined by an evolutionary algorithm followed by training.
The resulting network is tested in Taiwan using two warrants with the same un-
derlying security and a two-month sample period of 2000–02–10 to 2000–04-06
60 Chapter 2 Literature Review
using daily data. Using a delta-hedging strategy, the proposed model nearly dou-
bles the profit from the trading compared to the BSM model.
Lee et al. (2007) also report positive results in regards to using particle swarm
optimisation (similar to Dindar and Marwala, 2004; Dindar, 2004). Unlike Wang
(2006), it is not neural networks but the model itself that is determined. The au-
thors find it preferable to genetic algorithms Wang (conceptually similar to 2006)
when estimating implied volatility of Korean KOSPI 200 call options between
2005–02–27 and 2005–03–10.
One of the few to address the specific question of network configuration without
introducing a new training algorithm or optimisation technique are Thomaidis,
Tzastoudis, and Dounias (2007). For the specific purpose of option pricing, the au-
thors suggest various processes a researcher can follow. In particular they suggest
that the process should be informed by statistical tests or metrics. The process is
principally based on Lagrange Multiplier tests and information criteria such as
Akaike information criterion (AIC) for the simple-to-complex version. Neurons
are added until their marginal contribution falls below a particular (probability
or information criterion) threshold.
Alternatively, they suggest starting with a large model and tentatively removing
neurons, testing for its marginal contribution. In essence the process is similar
to step-wise regression with the exception that the tests are intended to allow
for non-linearity and the steps taken are specific to the architecture of neural
networks.
The authors test their methodologies using European-style S&P 500 equity op-
tion contracts for the period of 2002–05–17 to 2002–07–29. They find that the
simple-to-complex models perform better in terms of fit and generalisation.
Quek, Pasquier, and Kumar (2008) provides some additional guidance on the
topic. Citing prior research the authors note that some heuristics do not work
and raises the issue, which may affect the prior article’s methodology, i.e. that
pruning may work only if all variables are normalised. This may not apply in all
cases and care needs to be taken as variables are redundant at 0 only if that is
the case.
Despite considerable research in this area, it is noteworthy that the two points
of interest identified by Hutchinson, Lo, and Poggio (1994), additional pricing
information (variables) and performance metrics, little progress has been made
in both areas. Instead, the rather technical details of the model development
process and data-specific concerns have remained the primary focus of attention.
2.5 Financial Applications of Machine Learning 61
It may well be that the former cannot be addressed until a better understanding
of the latter has been achieved.
3. hybrid models.
Option Pricing Models with Special Consideration for the Volatility Input
Meissner and Kawano (2001) combined the pricing networks with a (modified)
GARCH(1, 1) volatility estimate assuming homogeneity. Importantly, they test
whether a single network for various underlying securities, and individual net-
works (one per underlying) outperform the reference model (BS). They confirm
both using options on 10 US stocks from 1999–05–01 to 2000–01–31, excluding
various observations based on a number of criteria. The results show, among oth-
ers, that the MLP and GRNN outperform the reference model. As other studies,
they too report large differences in architectures, ranging from 8 to 18 hidden
nodes for per-underlying networks (in one case with two hidden layers).
Pande and Sahu (2006) approaches the volatility estimation problem embed-
ded in ANN-based option pricing differently. Rather than limiting the choice to
historical volatility or using implied volatility, a principal components analysis is
conducted. A number of potential candidate models are provided; the results of
this step are not reported. The output vector is then an input to the option pric-
ing problem using fairly small networks of three to five hidden nodes. These are
compared using correlation, ME, and MSE. Using data provided by BSE India
on Satyam Stock, two thirds of which were used for training, the authors report
results consistent with findings in other markets, i.e. that the BS formula is fairly
62 Chapter 2 Literature Review
good, and here better, at pricing ITM options, and ANNs are better in cases of
OTM options.
Tzastoudis, Thomaidis, and Dounias (2006) improve the fit through two ap-
proaches, firstly, they simplify the functional form by converting part of the prob-
lem to the forward price (note that this conversion was discussed in chapter 2
in the context of future pricing and the dividend adjustment sometimes made).
Secondly, the study infers the volatility surface in the form of a price multiplier
as a function of moneyness and time-to-expiry on two distinct days and uses a
45-day historical volatility as a proxy. The data set consisting of European-style
options on the S&P 500 index on 2002–05–08 (in-sample) and 2002–07–19 (out-
of-sample, respectively). The resulting networks, which are modest in size having
between one and five hidden nodes, captured some of the prominent features of
the options. However, an additional weighting scheme for the MSE gives greater
weight to options near ATM levels. Interestingly, the best networks out-of-sample
were those with very few hidden nodes. The resulting hybrid models outperform
the BS model as had been reported previously.
As discussed before, Lee et al. (2007) applies particle swarm optimisation to
the implied volatility estimate using Korean data.
Tseng et al. (2008) compare two different GARCH models in combination
with artificial neural networks, the standard EGARCH and a modified Grey-
EGARCH. Applied to Taiwan Futures Exchange index options (2005–01–03 to
2006–12–29), of which 70 % was used for training (the authors term it a “fore-
casting model”). The results are mixed, depending on the error measure though
they favour the proposed Grey-EGARCH model.
Similar results on the same data set are reported by Wang (2009a) with
three competing underlying volatility models: GARCH, GJR-GARCH, and
Grey-GJR-GARCH. Here too, the quality varies across error measures but the
GARCH-based model generally underperforms. The authors conclude on balance
that the Grey-GJR-GARCH volatility leads to better results in the context of
ANNs.
A similar study was conducted by Wang (2011), who compared a novel hybrid
stochastic volatility with jump and support vector regression model on the one
hand, with several competing models. These included support vector regression
with stochastic volatility, with GARCH-based volatility estimates, the Garman-
Kohlhagen pricing model, and an ANN . The options being studied were currency
options on Australian Dollar, Euro, Japanese Yen and British Pound (all vs. the
2.5 Financial Applications of Machine Learning 63
US Dollar) during 2009. The author conclude that the proposed model outper-
forms the others but also that the ANN is preferred over the traditional model.
The review of volatility forecasting literature using machine learning techniques
is less exhaustive given the large number of publications. Instead, the review aims
to cover a wide range of different approaches with respect to variables, modelling
parameters, and design choices. As in the previous section, the notation is again
largely standardised.
Gonzalez Miranda and Burgess (1995) study European Ibex 35 options fore-
casting hourly changes using an integrated modelling approach as argued in this
thesis before, expected the ANN to outperform on the full set given the nature
of the learning process without this holding true necessarily for the reduced set,
an argument similar to the one in the introductory section of this thesis. They
also find that momentum is reflected and that changes in the implied volatility
appear to be a function of the strike state without, however, providing additional
details.
Carelli, Silani, and Stella (2000) approaches the option pricing problem in a
way very similar to the this thesis focussing on network selection, however. The
authors introduce a complex procedure to guide in the (feedforward) network
design. Using USD/DEM call and put options, they model the volatility surface
as a local volatility surface. They conclude that the process is helpful especially
in the presence of limited data and, as the discussion of local volatility above
showed, that the pricing assumptions made by market participants could be used
for pricing related instruments, the no-arbitrage argument.
Wang et al. (2012) (see also Wang, 2009b) offer one of the most comprehensive
studies of volatility modelling for option pricing albeit with the aim of forecast-
ing a price. They study historical volatility (30 trading days), implied volatility, a
deterministic volatility function using a quadratic form similar to Dumas, Flem-
ing, and Whaley (1996), and Peña, Rubio, and Serna (1999) (this is a function
of moneyness only instead of the volatility surface), GARCH and GM-GARCH
models (Grey-Model GARCH). Based on these models backpropagation networks
are fitted to the intraday call and put option prices for TXO (TAIEX options).
In addition to the various volatility models, an adjustment is made by using the
index future instead of the spot value due to the difference in dividend handling.
The time period included the full years 2008–2009 thus specifically including the
GFC. Networks are trained for various sub-periods and the total period though
testing errors across various time periods are not reported, only the author’s
64 Chapter 2 Literature Review
own performance ratio metric (for RMSE, MAE, and MAPE) by sub-period. The
authors conclude that the choice of activation has no significant impact on per-
formance, more neurons (tested up to 4) are preferable over fewer. They also
add that there was a drop in performance from 2008 to 2009 suggesting that
characteristics of the time-series changed over the course of the GFC. There is
insufficient data presented in the article, however, to understand the impact of
the GFC on generalisation and more broadly model fit across a range of models.
The discrete volatility function is second best, implied volatility is preferred. As
suggested in the econometrics literature, the GARCH-style models fail to outper-
form historical volatility in this study as well. These results are not consistent
with those by Wang (2009b), who compares a number of volatility models using
2003–2004 data concluding that the best models are of of the GARCH-family.
The methodology is otherwise similar if less complex.
Among the earlier studies, Malliaris and Salchenberger (1996), and Donaldson
and Kamstra (1997) stand out. Malliaris and Salchenberger (1996) uses a sim-
ilar approach as in the case of option pricing and studies the implied volatility
of S&P 100 index futures options for 1992. The explanatory variables are thus
the lagged volatility, current volatility, closing price, time to expiry, additional
volatility at different forward times, put open interest, as well as combined prices
of options and market prices. The latter implicitly modelling the relationships
between option prices and the prices of the underlying securities discussed above.
Donaldson and Kamstra (1997) compare various models as well as an ANN
using S&P 500, Toronto Stock Exchange Composite Index, Japan’s NIKKEI, and
London’s FTSE index between 1969–01–01 and 1990–12–31. The authors report
that neural networks with lagged unexpected returns as inputs can be useful in
forecasting volatility. Importantly, they find that there “may be important differ-
ences between the processes driving returns volatility in the four countries [they]
study.” This is a conclusion Lajbcygier also reached with respect to ANN-based
option pricing compared with the US.
Hu and Tsoukalas (1999) compare various ARCH-based models and simple
statistical models to an ANN combining the former as well as an averaging model,
i.e. creating an ensemble learner in a way. They find that it does not improve
results generally though it does perform better with respect to mean absolute
2.5 Financial Applications of Machine Learning 65
error (instead of the square residuals) and “behaves well in the crisis period [of
1993].”
Consistent with previous results and using option straddles for evaluation pur-
poses, Dunis and Huang (2002) find that recurrent neural networks outperform
traditional models as well as model combinations at forecasting volatility. The
study covers British Pound and Japanese Yen exchange rates (each against the
US Dollar) between 1993–12 and 2000–05. The input vector covers “exchange rate
volatilities […], the evolution of important stock and commodity prices, and, […],
the evolution of the yield curve. In addition to the usual RMSE, MAE, Theil’s
U, and the percentage of correct directional forecast are evaluated for forecasting
accuracy evaluation in addition to the Wald test statistic. The trading strategy
using straddles assumed fixed holding periods and included transaction cost.
Using lagged returns of US Dollar/Deutsche Mark (DM) exchange rates between
1980–01–01 and 2000–01–01, Gavrishchaka and Ganguli (2003) studies the use of
SVMs. The authors conclude that the SVM can be used to model the long memory
of the volatility processes in particular, and outperforms traditional GARCH
models.
Dash, Hanumara, and Kajiji (2003) use a larger data set including also the US
Dollar rates against the Japanese Yen and the Swiss Franc and hourly quotes
during 1999 (and a limited set for Deutsche Mark quotes). The inputs to the
networks include currency data, yield curve data, and GARCH data, which also
serves as a reference model. The actual subset used for any model is found using
genetic optimisation or a special algorithm. The results are largely consistent
with prior literature, i.e. That improvements above the GARCH model can be
made using specific networks. The actual performance, and here the inclusion of
specific inputs, is to some degree specific to particular securities.
Positive results are reported by Hamid and Iqbal (2004) with respect to fore-
casting S&P 500 index future volatility (several contracts) between 1984–02–01
and 1994–01–31 when compared to the implied volatility of the Barone-Adesi
and Whaley model. The study implements multiple forecasting horizons and uses
non-overlapping data. The explanatory variables are two index values, lagged
futures on the index and seven commodities, an exchange rate (Japanese Yen),
and several points on the yield curve. The selection is based on correlation and
the “relative contribution coefficient.” In addition to the usual MAE and RMSE
statistics, the Mann-Whitney test is used for comparison.
66 Chapter 2 Literature Review
Kim, Lee, and Lee (2006) use an alternative method for training RBF networks.
Two networks are trained first to model the local volatility, then to minimise the
pricing error of the first stage network. Lower pricing and hedging errors are
reported for 1995 S&P 500 index European call options.
While not using ANNs, Audrino and Colangelo (2010) model the implied volatil-
ity surface using a semi-parametric technique, regression tress. They use Op-
tion Metrics’ Ivy database with S&P 500 index options between 1996–01–04 and
2003–08–29. Given that a surface is evaluated, they suggest using not only the
daily and overall SSE but also a weighted metric, “the daily and the overall aver-
aged empirical criteria.” They also experiment with a number of predictor vari-
ables, notably the yield curve, and option-price related data based on a discrete-
time reference pricing model. For the specific technique applied, moneyness and
time to expiry are the most important factors, with leading and lagging factors
playing a smaller role. The authors conclude that the approach leads to the best
model of volatility surface dynamics and that it may even be used when there are
structural breaks.
According to Mantri, Gahan, and Nayak (2010), ANNs are no different in
their ability to forecast stock market volatility compared to GARCH, EGARCH,
IGARCH, and GJR-GARCH models. Using the open, high, low, close values (or
only the close) of two Indian index series (BSE and NIFTY), the ANN forecasts
are less volatile but using analysis of variance (ANOVA) to compare the annual
observations, no difference could be found. The data covered the years 1995 to
2008 and only the annual observations were used for the statistical analysis.
A combination of SVM and GARCH is introduced by Chen, Härdle, and Jeong
(2010). This approach is found in later discussions on the topic. An SVM is
fitted to the modified returns series and conditional standard deviation in an
iterative way. This is conceptually similar to traditional recurrent networks or
time-delayed feedforward networks. Using MAE, directional accuracy measures,
and the Diebold-Mariano test for differences in MAE on daily British Pound
(against the US Dollar) exchange rates, NYSE composite index forecasts (both
2004 to 2007), and a synthetic data set, the authors conclude that MLP with an
additional feedback connection and recurrent SVM GARCH models outperform
alternative ones in one-period ahead forecasts.
Ou and Wang (2010a) compares standard GARCH, EGARCH and GJR mod-
els to least-squares SVMs. They term this a hybrid approach though it is struc-
tured differently from what is considered ‘hybrid’ here. The authors estimate the
2.5 Financial Applications of Machine Learning 67
GARCH parameters using SVMs. Apart from finding that these combined mod-
els perform better generally, the authors stress that they are more robust to the
changes brought about by the GFC. The data covers the stock market index of
each of Singapore, the Philippines and Kuala Lumpur comparing the years 2007
and 2008 directly.
Ou and Wang (2010b) and Hossain and Nasser (2011) compare such support
vector machines and related models for forecasting volatility. Studying the Shang-
hai Composite Index between 2001 and 2006, and the Bombay Stock Exchange
(2006–10–05 to 2010–11-01) and NIKKEI 225 (2001–01–04 to 2010–11–01) re-
spectively, the authors compare the standard GARCH model to support vector
machines (for regression) and relevance vector machines (a probabilistic version
of SVM). Importantly, for the methodology developed below, the models are
structured similar to the GARCH-model. They use past observations of returns
and residuals of the GARCH-form model to predict the next observation. Ou and
Wang (2010b) finds that the relevance vector machine performs best and the stan-
dard GARCH model worst, the SVM models are relatively good as well. Hossain
and Nasser (2011) add that the ARMA-GARCH model is superior to GARCH
but more importantly that the vector machine models are the only ones meeting
robustness criteria. This is surprising given the strong theoretical foundation of
GARCH.
Ahn et al. (2012) employ a strategy of repeated training and testing using
overlapping periods, and option implied volatility and greeks, to forecast future
implied volatility for the KOSPI 200 (Korean index) ATM options successfully.
squared volatility or similar estimates depending on the model, i.e. the last term
varies with the model that is combined to form the hybrid network. The resulting
hybrid model (a combination of ANN and exponential GARCH (EGARCH)) is
found to have greater predictive power with respect to volatility and directional
forecast.
Aragones, Blanco, and Estevez (2007) simplify the inputs to an RBF network,
and consequently to an equivalent MLP, to the implied volatility and 11-day mo-
mentum of the IBEX–30 index and its futures for a number of sub-periods. To
determine the quality of the forecasts, the authors used linear regression of pre-
dicted volatility and observed volatility. They find that the RBF model adds value
above the implied volatility estimate and that this is true even in the presence of
external shocks.
A different form of hybrid models, at least in the context of volatility modelling
and option pricing is that proposed by Chang (2006), and Chang and Tsai (2008).
In both cases two competing models are built, one of which is a non-parametric
one, and their results are combined using a weighting learned from data using
a machine learning technique. The models are a fuzzy neural system and an
non-linear GARCH model combined using support vector regression (SVR), and
a combination of an SVR and the Grey Model and GARCH modelling using
an ANN, respectively. Using index data of four major equity markets and data
from London International Financial Futures and Options Exchange (LIFFE),
they find that the use of machine learning leads to better predictions while the
combined model performs especially well.
Andreou, Charalambous, and Martzoukos (2010) study the fitting of deter-
ministic volatility regression functions using neural networks. They are function
modelling the volatility surface. Similar to Dumas, Fleming, and Whaley (1996),
or Peña, Rubio, and Serna (1999), a number of functions are suggested, the most
complex one being quadratic with an interaction term. An ordinary ANN is used
with an additional layer containing the parametric pricing function. This allows
for the training of the network to configure the parametric model, i.e. to sup-
ply the parameters accordingly. Using S&P 500 index call options for 2002–01 to
2004–08, the authors conclude that the proposed model are preferable. They also
note that if the model is chosen according to its hedging performance, it per-
forms better at this function than one chosen according to pricing accuracy. This
supports earlier results and implies that the decision criterion is critical when
2.6 Open Research Problems 69
• Despite (or possibly because of) the general difficulties of attribution and
the lack of explanatory power of ANNs, relatively little prior research has
been conducted with regard to the benefits of a volatility model as opposed
to a direct pricing mechanism. Some exceptions to this were discussed in
section 2.5.2.
The research throughout this thesis aims to address these shortcomings of the
prior literature by focusing on the specific hypotheses stated in chapter 1.
In addition to these questions, others will be left for future research, further
discussed in 5.3, but they impact the methodology design to some degree and are
thus stated explicitly:
• Despite extensive research over the past few decades, surprisingly little
progress has been made with respect to the design of ANNs, their archi-
tecture and their learning process (except with respect to regularisation,
bagging and boosting) which are now fairly well-understood. It is therefore
difficult to replicate designs or form a justifiable view of whether a partic-
ular design is good especially when it fails to perform as expected. In the
absence of clear general principles, the failure may be due to the user or a
result of the data; it is difficult to form an opinion in this case.
2.6 Open Research Problems 71
• A reference model for the volatility surface is also missing from the literature
unless one is willing to use the local volatility models. This is particularly
true for vanilla options where it is not possible to find a source from which
to infer the volatilities.
Chapter 3
Methodology
Fitted Reference
Long-run Parametric Pricing Model
Historical Volatility 𝜎HVL
𝜎HVL 𝑀,𝑇 for an 𝐶 HVL
Volatility Surface Model American Call
Model 𝜎HVL 𝜎fit
𝑀,𝑇 Option 𝐶 ref
Fitted Reference
Short-run Parametric Pricing Model
Historical Volatility 𝜎HVS for an
Volatility 𝜎HVS 𝑀,𝑇 𝐶 HVS
Surface Model American Call
Model 𝜎HVS 𝜎fit Option 𝐶 ref
𝑀,𝑇
ANN-based Reference
Volatility Pricing Model
Surface Model 𝜎ANNs
𝑀,𝑇 for an 𝐶 ANNs
𝜎ANNs American Call
𝑀,𝑇 Option 𝐶 ref
ANN-based
Pricing Model
for an 𝐶 ANN
American Call
Option 𝐶 ANN
Figure 3.1: Overview of models used including implied volatility models, which
are used for evaluation only. 𝜎, 𝜎𝑀,𝑇 , and 𝐶 superscripts indicate the
significant model difference, i.e. the first model that is different in the
process.
Market data used as input variables is discussed in greater detail in
subsequent sections of this chapter.
brief description of the data collection process follows along with a more detailed
explanation of the implementation, especially with respect to the architecture
and learning process of the ANNs. An overview of evaluation metrics is given and
the chapter concludes with some critical remarks and epistemological questions
that remain.
Figure 3.1 gives an overview of the methodology outlining the essential models,
explanatory and explained variables for each model. The ultimate goal is to iden-
tify whether the benefits of using ANNs exist in the current Australian equity
options market insofar as they are applied to pricing. This notably excludes the
analysis of decision support for whether and when to exercise options, specific
trading, investment, hedging, or market making strategies, etc. The focus is prin-
cipally on the general applicability of the technique to pricing and therefore on
3.1 Model Development and Experimental Design 75
their average performance with respect to some metric. Of particular and specific
interest is if and how ANNs can improve the option pricing process, whether it is
through a better volatility forecast as the most significant input, through a better
pricing mechanism when presented with a volatility forecast or if the performance
improvement is due to the combined effects of forecasting and pricing.
where ‘name’ is the model name, 𝑦 refers to the desired output, 𝜃 to the model
specification, and 𝑥⃗ to the input vector. Three particular types of model classes
are introduced and using the above notation, results in them to be one of:
Historical
Volatility 𝜎110
Historical
Volatility 𝜎60
Historical
Volatility 𝜎20
ANN-based
Volatility Forecast
(Daily) Volatility
𝜎ANNd
Model 𝜎ANNd
Historical
Volatility 𝜎5
Return 𝑟𝑡
the network provides the annualised volatility (see equation 2.21) forecast based
on the past 𝑛 days of historical volatility, the last return observation, and its
squared value. The inclusion of the previous five day volatility is largely due to
what appears to be common practice by professionals to use a single week’s worth
of observations. As Haug (2007) points out, this is a very unreliable measure, the
number of observations is far too low to yield a meaningful result. A transition
to intraday data is an alternative, whether such a move would be beneficial is
unclear, however. Here the goal is to build a simple reference ANN-based model
without any particular attempt at optimising. The high noise ratio in the last
component typically results in the output to be independent from that particular
input variable.
As is apparent from the review by Poon and Granger (2003), the evaluation is
based on a variety of proxies for the volatility including realised volatility over a
range of forward periods. The argument made with respect to the wide confidence
interval in the 𝜎5 input, also applies to the target value. This is both true during
3.1 Model Development and Experimental Design 77
the learning where a value is presented as well as to the error calculated during
model evaluation.
A four-week time frame is used for the forward period. There is no particular
reason for the choice, as there are typically no particular reasons for such choices
in past literature. Four weeks is sufficiently long to allow for a reasonable amount
of data – albeit not as the conventional 30 observations used in statistics – but
short enough to test the response of the model to temporary fluctuations. The
average time to maturity of options considered, starting either at the time of
creation or at the time of first sale, or indeed at any other point in time of
significance (by volume, open interest, etc.) are all alternative choices. In the
absence of additional criteria or more specific objectives of the decision maker,
the shortest period is used for consistency with the reference models. Furthermore,
time-weighted volatility or volatility weighted by option “greeks” could be used
but they too are most useful if a specific objective is pursued.
As with traditional econometric models, the forecast at 𝑡 is based on the pre-
vious data only and is thus applicable to any pricing and evaluation at 𝑡 and not
𝑡 + 1. This means that no additional delay needs to be used for a valid model.
This is also true for the networks trained while the actual implementation differs
slightly. Due to the way time is represented, the implementation uses all data up
to and including the current day, i.e. including the closing price of the current
day. When synchronising time series across models, the one day (specifically one
business day) lagged volatility measure is used for pricing options. This ensures
that volatility estimates are known at the beginning of the trading day and based
on the previous close and earlier information.
The literature review demonstrated that past studies used a number of ad-
ditional variables for the input. Given the focus on index volatility, these often
referred to related indices. No such information is included here. There are a num-
ber of methodological reasons for their omission. Firstly, determining a related
time series is less clear for an equity security. While a related (benchmark) index
could be chosen, it is not clear why any or a particular index should be a suitable
input.
Secondly, unless the same inputs are presented to the competing models, it
is not clear if the different model form or the additional inputs are the driver
for varying performance. Such comparisons would have to include ARCH-style
models with exogenous variables and regression models using historical volatility
along with the enhanced ANNs. This is a question left for future research (see
78 Chapter 3 Methodology
Long-run
Returns series Historical Volatility Forecast
𝑟𝑡−110,…,𝑡
⃗ Volatility Model 𝜎HVL
𝜎HVL
Short-run
Returns series Historical Volatility Forecast
𝑟𝑡−20,…,𝑡
⃗ Volatility Model 𝜎HVS
𝜎HVS
GARCH(1, 1)-
Observed Returns Volatility Forecast
based Volatility
𝑟⃗ 𝜎GARCH
Model 𝜎GARCH
section 5.3). The use of the risk-free rate, i.e. a government bond yield as a proxy,
is particularly difficult. It is certainly justifiable on the grounds of being a proxy
for investor’s risk-preferences.
Thirdly, using it in the context of volatility forecasting implies its use in the
subsequent pricing step, where it is – under the assumptions made here – incorrect
due to the margining regime.
The choice of forward period is only relevant with respect to the evaluation of
volatility models on their own. Once a volatility estimate is needed for option
pricing, the period would have to match that of the time to expiry. As explained
before, an individual model for each such period may have to be built, or the one
period model applied iteratively as was suggested for the GARCH model. This is
feasible in the case of GARCH models as these are combined forecasts of return
and volatility. This approach fails in the case of direct volatility forecasts using an
ANN. This requires particular attention when choosing the time frame for which
to measure realised volatility and when comparing different models for selection
purposes.
The models are built for panel data, i.e. training includes data over the in-
sample period and across several securities. This is benchmarked against stan-
dard econometric models, specifically historical long- and short-term historical
3.1 Model Development and Experimental Design 79
volatility 𝜎HVL and 𝜎HVS (see Figure 3.3, both corresponding to the long and
near-short-term specification of the ANN-based model), and the GARCH(1, 1)
model 𝜎GARCH (see Figure 3.4). Those were chosen as representative of frequently
used models in research and practice based on the review literature. They, and
the GARCH model in particular, do not necessarily represent good models in
absolute terms but sufficiently well-understood models to serve as benchmarks.
They are also used in prior ANN volatility forecasting research.
Note that the historical and GARCH(1, 1) models return volatility for the
specific sampling frequency 𝜎𝑠 requiring the usual adjustment (see equation 2.21)
to arrive at annualised volatility 𝜎. This is done implicitly in the case of the
neural networks. For this reason, the networks are also presented with annualised
volatility but not annualised returns.
Hypothesis 1 can then be answered by comparing the competing models:
by comparison to the realised volatility over the fixed time frame as specified
in the ANN-based models (including applying the GARCH model repeatedly to
arrive at a comparable time frame). Days of historical volatility refers to the
number of deviations from the mean evaluated and the number of underlying
returns is thus greater by one.
fit
𝜎𝑀,𝑇 (𝑀 , 𝑇 , ⋅), where the last parameter represents the volatility forecast as per
any of the competing models above:
𝛽0 + 𝛽1 𝑀 + 𝛽2 𝑀 2 + 𝛽3 𝑀 𝑇 + 𝛽4 𝑇 2 + 𝛽5 𝑇 (3.3)
The parameters are found by OLS regression. For practical purposes, the volatility
will be set to 0 (similar to the lower bound in the literature mentioned above)
where the regression leads to negative values. Unlike those authors, the question
that arises here is how to integrate the reference volatility forecast. Apart from
adding it as another explanatory variable and fitting more parameters, including
those of additional interaction terms, two options are available, which do not
require a larger set of parameters:
• the surface is the relevant scaling function, i.e. points on the surface repre-
sent factors, which are to be applied to the reference volatility forecast.
The former resembles the – usually successful – hybrid learning approaches, which
use a reference model as a base and learn the difference. The latter is similar to
some adjustments that are made by practitioners using tables to modify point
forecasts for use in parametric models. Furthermore, this preserves the shape of
the surface when there are changes to its level, the latter is chosen:
𝜎𝑀,𝑇
fit
𝜎𝑀,𝑇 (𝑀 , 𝑇 , 𝜎) ∶ 𝛽0 + 𝛽1 𝑀 + 𝛽2 𝑀 2 + 𝛽3 𝑀 𝑇 + 𝛽4 𝑇 2 + 𝛽5 𝑇 = (3.4)
𝜎
There is no valid reason for limiting the ANN to forecasting volatility for a single
time to expiry or level of moneyness, however. Given the flexibility they offer, it
is equally valid, and potentially beneficial to derive the volatility surface directly
from the ordinary volatility forecasting inputs and the additional parameters of
moneyness and time-to-expiry. The resulting model specification is:
ANNs
𝜎𝑀,𝑇 ∶ (𝑀 , 𝑇 , 𝜎110 , 𝜎60 , 𝜎20 , 𝜎5 , 𝑟𝑡 , 𝑟𝑡2 ) ↦ 𝜎𝑀,𝑇 (3.5)
Moneyness 𝑀
Time to expiry 𝑇
Historical
Volatility 𝜎110
Historical
Volatility 𝜎60
ANN-based
Volatility Surface
Volatility Surface ANNs
𝜎𝑀,𝑇
Model 𝜎𝑀,𝑇 ANNs
Historical
Volatility 𝜎20
Historical
Volatility 𝜎5
Return 𝑟𝑡
chosen here too for consistency reasons. This allows for a direct comparison of
the resulting option prices to those of the direct pricing network.
Hypothesis 2 could then be answered by comparison of these models in regards
to their fit to the implied volatility surface or at the money but with varying
time-to-expiry, to realised volatility.
The selection criteria and the underlying error measures and statistical inference
methods used are discussed later in this chapter. In regards to the option pricing
model, it shall suffice to note here that the appropriate choice between the two
sets depends on the objective of the user of such a model. If the goal is trading
highly-liquid options, the former will be sufficient as this limits the modelling to
ATM options, or those relatively close to it, in most cases. If the question is of a
more general nature, as is the case in this thesis, the second approach is needed as
it covers ATM, ITM, as well as OTM options equally. The comparison of implied
volatility surfaces, rather than observations on them, is thus the valid choice for
this thesis.
82 Chapter 3 Methodology
• The Haug-Haug-Lewis (HHL) model (Haug, Haug, and Lewis, 2003), which
is also applicable in the presence of discrete dividends, is even more costly
with respect to processing time.
3.1 Model Development and Experimental Design 83
• If the discrete dividends are considered proportional to the price, i.e. they
can be expressed as a yield but paid at discrete intervals, the method by
Villiger (2006) can be used to approximate the price of an American call.
With the consideration of dividends being one of the contributions of the re-
search, the first of the options is not a suitable choice. Among the remaining
options, the choice depends partially on what assumptions and simplifications
should be made. While the CRR approach is the most realistic, it is only benefi-
cial if the payments are known. While this can be assumed for research purposes,
especially in countries and for industries that offer comparatively stable dividend
streams, the model by Villiger (2006) may be preferable if dividend payments are
not known or preference is to be given to the assumption of a constant yield. On
balance, and in large part due to the comment on its frequent practical use17 , the
Bjerksund and Stensland model is used for this thesis. This requires a conversion
of the dividend payments to a cost-of-carry rate.
The notation is modified to reflect the choice resulting in 𝐶 ref referring to the
Bjerksund and Stensland approximation for the price of an American call option
with the usual parameters including the dividend yield.
Finally, Figure 3.6 shows the additional model required for Hypothesis 2. It is
designed to answer the question whether there is a benefit to splitting the problem
into two parts, one for volatility modelling (as above) and the other for option
pricing, or whether the two are best integrated given a parsimonious specification:
The innovation is the model replacing 𝐶 ref (in Figure 3.1). It represents an
option pricing model as previous models. Instead of supplying a volatility forecast,
17
The additional benefit and direct consequence of its popularity is that the model is already
implemented in MATLAB.
84 Chapter 3 Methodology
Historical
Volatility 𝜎110
Historical
Volatility 𝜎60
Historical
Volatility 𝜎20
Historical
Volatility 𝜎5
ANN-based
Pricing Model for Option price
Return 𝑟𝑡
an American Call 𝐶 ANN
Option 𝐶 ANN
Squared Return 𝑟𝑡2
Moneyness 𝑀
Time to expiry 𝑇
Dividend 𝑞
response to price changes for the purpose of informing participants about margin
requirements and not for pricing purposes.
The latest reference data and transaction information was retained and con-
verted from text format to binary representation as explained below.
Equity data was retrieved from two sources but both through Securities Indus-
try Research Centre of Asia-Pacific (SIRCA) (2010–2012). Current prices were re-
trieved at 5-minute intervals from Thomson Reuters TickHistory including addi-
tional data relating to the securities and their symbology. Although the database
also provides the dividend and corporate action data, it was decided to use the
CRD database from Securities Industry Research Centre of Asia-Pacific (SIRCA)
(2010–2012) and use the pre-calculated dilution factors to adjust returns. The use
of the existing formula and implementation allow in principle for replication and
it is assumed that the research character of the database implies additional prior
scrutiny of the adjustment methodology as well as the underlying data.18
The equity data was restricted to the members of the Standard and Poor’s
(S&P) ASX 20 index membership on June 30, 2007 excluding the special Telstra
equity security but including the standard Telstra shares (TLS). The data covers
the sampling period of July 2000 to June 2011. During processing, the data was
synchronised, i.e. equity and options data aligned. Since the equity data consti-
tutes the most constrained set, it also defines the overall data set resulting in
samples only over that period and only for members of the index on the given
date.
Unlike some research conducted in the past, there has been no attempt at
fitting models repeatedly. Consequently, no rolling-window subsets of data were
created and no update to the candidate set, i.e. the investment universe to which
the models are applied, needed to be made. It should be noted that the indices
published for Australia are designed such as to have few constituent changes
among other criteria (Standard and Poor’s, 2011).
18
In research unrelated to this dissertation, the author’s supervisor and the author investi-
gated pricing differences resulting from varying adjustments or base data across databases to
decide on a particular source of data for research. Pricing differences did exist but they were
usually quite small and did not raise concerns in that context.
88 Chapter 3 Methodology
Parse data
Synchronise data
Additional
post-processing of
results
The following steps are necessary to process data, implement, and run or sim-
ulate the models (see Figure 3.7 for a schematic illustration):19
4. Synchronisation of daily and intra-day data through the use of lookup in-
dexes;
The first two steps are largely self-explanatory and mainly aimed at enabling
further steps and improving processing time in later stages. It is critical that
during synchronisation no bias is introduced. Time-series offsets need to be chosen
such that at any point in time a decision is made (or is simulated to be made),
it is based only on available information, this implies strict time precedence.
19
The steps do not represent a strict sequence due to varying dependencies of the selected
models and some steps were done out of the specified order. For example, comparisons between
in-sample and out-of-sample results can be made without reference to competing models and
were thus partially done at the time the models were fitted.
90 Chapter 3 Methodology
There is one exception in the case of option pricing and specific to derivatives
markets. Given that theoretical pricing models are based on arbitrage arguments,
the process is conceptually symmetrical. The no-arbitrage situation can be cre-
ated by the equity price moving in response to the option price or vice versa.
Given the high liquidity of the equity market and the more restrictive regulatory
environment in derivatives trading, the standard time-ordering was used regard-
less. The price of the underlying was observed and the price of the option based
on it. The reverse was not investigated or simulated. Under the assumption that
equity markets drive derivatives markets, no look-ahead bias results from this
treatment. 20
The data was then split into in-sample and out-of sample data. The same cut-off
point was used for the split as for the index membership date. This treatment
is equivalent to assuming that the decision to use the ANNs was made on June
30, 2007 and that all simulation as well as the investment universe they are
based on, were run at that point in time. The date is in all likelihood the most
conservative of cut-off points as it splits the data set not only into in-sample
versus out-of-sample but also marks the end of a particularly good and stable
economic period and the beginning of the GFC.
Using the in-sample data set only, the GARCH models were fitted using stan-
dard parameters and a specification of GARCH(1, 1). The GARCH volatility
model is then synchronised again with the main data set. The resulting set is a
full in-sample set.
An adjustment is needed with respect to volatility. As pointed out in the lit-
erature review, the volatility in the pricing model needs to be stated in terms of
annual volatility requiring the usual adjustment. This can be done during volatil-
ity modelling or during pricing. As a matter of convention and convenience, the
adjustment is made before reporting volatility forecasts, i.e. volatility forecasts
represent annualised 𝑝-period forward volatility in all tables and figures.
The simulation, time series creation and synchronisation steps were then re-
peated using the fixed model parameters for all models (including GARCH, IV,
volatility surface, ANN). The results were stored for later retrieval, statistics cal-
culated (see 3.4 for evaluation methodology and Chapter 4 for results) and tables
as well as figures created.
20
This approach is not sufficient if it is assumed that there is an interaction, i.e. a mutual
influence, and it is not heavily biased in favour of the equity market.
3.3 Model Fitting and Testing 91
Input Node 1
Hidden Node 1
Input Node 2
Output Node
Input Node 3
Hidden Node 2
Input Node 4
The problem is avoided to some degree by following the process discussed be-
fore. The approach already provides for iterations and thus for opportunities to
escape local minima. This comes at a price of choosing a less parsimonious model
than needed. An additional iteration may not improve the results because of the
additional hidden node but rather because of the different starting point. This
network would be chosen even though the same or similar result could have been
found with a smaller number of hidden nodes. The process does, however, provide
an opportunity to arrive at fairly parsimonious models by its design compared
with some other alternatives.
The Levenberg-Marquardt back-propagation algorithm was used in conjuction
with the MSE performance metric. The hidden layer used the sigmoidal and the
output layer the linear transfer function.
The methodology by Thomaidis, Tzastoudis, and Dounias (2007) discussed be-
fore was considered as well but ultimately rejected for the following reasons.
Firstly, there appears to be little research following this specific process. Sec-
ondly, it is not clear where to start. While the different steps are outlined, it is
not clear if starting with one neuron or variable instead of another one leads to
different results and how such problems can be detected or corrected for.
Thirdly, the process does not address the issue of local minima and thus addi-
tional training is required in any case. Finally, it is so far unclear if the methodol-
ogy is sufficient to account for non-linearities both in the process and in the statis-
tics and metrics used. It should be noted that the use of well-defined statistics and
a clear process is extremely attractive from both a research and a practitioner’s
perspective and should be investigated further.
resulting model needs to be constructed. If multiple networks are trained for any
particular design, this choice is a dual one of choosing between network designs
and between within-design networks. Since the simple process by Vanstone (2005)
is used, this complication does not apply.
The same process also requires the choosing of the best network from the various
designs. Assuming the error metric is appropriate for the application domain, the
best network provides the best fit to the data. It is also possible, however, that
such an approach leads to overfitting and that the error metric used for stopping
and evaluation is not consistent with the ultimate use of the model. The metric for
design choice may, therefore, be different from the one used for training. Following
Vanstone (2005) and other researchers as discussed in the literature review, the
network design with the best performance is used and no additional distinction
is made with respect to error metrics.
These issues are common and not unique to ANNs but are potentially more
significant given the universal approximation ability. In principle, the best-
performing network may also be the one most sensitive to structural changes
and may perform particularly poorly out-of-sample. It is not a foregone conclu-
sion to use the best network if such an effect is anticipated. Further, the problem
is not eliminated through regularisation or similar approaches as these are still
within the given data set. It may thus be beneficial to determine, where possible,
such sensitivity and use it in determining the best of the chosen networks, best
not with respect to an error metric but best with respect to the wider set of
performance constraints.
Even in the presence of such analysis, it is not clear that a single network even
should be sought or whether a combination of networks along the development
path may be superior. Combining forecasts is a common approach in statistics
and econometrics and Poon and Granger (2003) point out one application of a
learner regression in a purely econometric framework.
No attempts have been made in regards to the analysis of sensitivity or combin-
ing networks in option pricing previously and the number as well as the interde-
pendency of choices require a separate analysis of the issue. Instead of introducing
several innovations at the same time, only a single network is chosen, the best
with respect to the same error metric as is used for training using the process by
Tan. However, the total number of parameters will be reported along with the er-
ror metrics to enable at least a basic discussion of the complexity of the networks,
which is the principle cause of overfitting. This allows for the determination of
94 Chapter 3 Methodology
measures of parsimony and aids in the second step, the question of whether to
choose the ANN or a standard model as discussed next.
• The regression error of the network needs to be determined for each sample
presented to the network during training in order to determine how the
weights need to be modified in order to minimise the average error (see
3.3.2).
Common to all is the need to determine what metric to use and the data it is
applied to, i.e. which error of what. Table 3.1 shows the standard definitions of
common error formulas as found in the literature (see Chapter 2).
Due to the frequent use in (linear) regression and generally favourable statisti-
cal and numerical properties, it is not surprising that the MSE is frequently used
during ANN training and as discussed above for network selection as well. It does
have two main problems, however, it penalises large errors significantly (due to
the squaring) and it is difficult to interpret with a squared unit. This is similar to
a problem found in expressing volatility as 𝜎2 and for the same reason, the posi-
tive root is often reported instead. While MSE is used for network development
purposes, the results reported in the following chapter will therefore represent the
RMSE instead.
The use of the absolute measures and in particular of MAPE is frequently
found as it is not sensitive to the level of the estimate nor does it focus as much
on large errors, which may be due to outliers. Neither issue is likely to exist in the
methodological framework used in this thesis: The use of the homogeneity hint
combined with the data preparation steps are expected – and indeed are designed
– to result in a data set that is not sensitive to price levels. In the case of volatility
forecasting, the principal variables are rates and while large differences may exist
within as well as between time series, they are not expected to be so frequent and
so significant to cause problems. It can also be argued that the greater penalty
for large errors is beneficial in the case of option pricing. This is a security class
typically used for hedging, large unexpected events are thus of particular concern.
All average error metrics are reported, principally to allow for comparisons
with prior studies but only the MSE is used for decision making in the presence
of competing models. The model with the lowest error is the preferred one.
Even with the measure decided, the question remains the error of what is to be
calculated and used. In the case of the first and last hypotheses, no distinction
need be made. The variable being forecast (estimated) is also the variable of
interest, the error is thus between the forecast (estimation, respectively) and the
actual observation.
When evaluating the remaining hypothesis, the question is subtler. The question
is if the volatility forecast results in a better option price. The question thus is
whether the goal should be to minimise the forecasting error (all forecasting
errors), to minimise the pricing error resulting from the forecasts (all pricing
errors), or if the forecasting error is to be minimised during training but the
model chosen based on pricing errors (mixed metrics).
The first allows for a consistent treatment of error metrics between the first
and the second hypothesis while the second allows for a consistent treatment be-
tween the second and the third (with respect to error measures). The third choice
appears particularly suited to accommodate the competing goals. While super-
96 Chapter 3 Methodology
21
This point implies that either approach is valid depending on one’s perspective. If the focus
is on outcome rather than process, the use of all-pricing-errors is likely beneficial as it offers
the opportunity to learn the ‘right’ forecasting model strictly for pricing. The view taken here
is that the volatility forecast drives the pricing and the regression is thus biased towards the
beginning of the process rather than its end.
3.5 Theoretical and Practical Limitations 97
neously for multiple models, rather than pairwise, to reduce the compounding of
errors.
ANOVA results are reported but not used for model choice. Instead they are
used to draw conclusions at the end not guide in the development.
Analysis of Data
4.1 Overview
In this chapter the intermediate statistics of the simulations and their results are
reported. The presentation is structured along the development process of the
various models, i.e. from left to right in Figure 3.1.
The chapter is organised as follows: general data characteristics are presented,
including a number of pre-tests required for analysis. This is followed by training
and testing results of the various networks and error terms of the parametric
models used for comparison for volatility forecasts, volatility surface models, and
option pricing models.
For each stage, the data characteristics and fitting of data is reported. This is
followed by a comparison of models and a comparison of performance between
in-sample data and the out-of sample set.
When individual securities are reported, their latest symbol in the sampling
period is used.
Table 4.1: Descriptive and Test Statistics for Underlying Equity Securities for the
In-sample Period. Significance at the 5% (1%) level is indicated by a
(b ).
The data covers the period 2000–01–01 to 2011-06-30 and is split into three
parts. The first year, 2000, is used to compute cumulative dividends and the
resulting yield. It is also used to compute initial historical volatility if applica-
ble. Furthermore, the first year’s 252 trading days were used as the basis for the
annualisation in all subsequent years. The following period up to and includ-
ing 2007–06–30 was used to fit models or train the networks, respectively. The
remaining data, including the GFC period was used for out-of-sample testing.
For the purpose of volatility forecasting, the problem consists of a set of time se-
ries. Each needs to be analysed separately for the historical volatility and GARCH
model. The latter is only useful in the presence of anomalies. In particular, En-
gle’s ARCH test and the Ljung-Box-Q test are applied to the series to test for
heteroscedasticity and residual auto-correlation, respectively. The results can be
found in Table 4.1.
4.3 Volatility Forecast Evaluation 101
CGJ, RIN, and SGB. For these securities, changes affecting the whole market
do not affect their average forecasting errors if the security was not yet or is no
longer traded at that time. The identification of regime changes and the resulting
forecastibility is beyond the scope of this research. Therefore, no exclusions or
adjustments were made in this regard.
Following the application of the historical volatility model, the GARCH mod-
els were fitted. Any missing values were removed in this instance as well. Since
historical volatility models and GARCH models treat missing values somewhat
differently, percentage errors are undefined (due to a target value of 0) for differ-
ent observations. The resulting models (see Appendix A for details on the model
specifications) were applied to the time series in-sample and out-of sample. Tables
4.6 and 4.7 show the in-sample (out-of-sample) results for individual securities.
4.3 Volatility Forecast Evaluation 105
The network training followed the process described in the previous chapter.
Prior to training, all variables are modified to fit into a fixed interval. Figures 4.1
and 4.2 show the original values and their ranges, outliers are compressed at the
extremes.22 These plots, like all other box plots in this chapter, show the central
half of the data inside each box. It is unsurprising that the value range of volatility
observations is different for the out-of-sample data. As would be expected, the
GFC led to an increase of volatility. The general characteristics of the inputs are,
however, similar.
The validation set was used to choose the best configuration. The best network
configuration for volatility forecasting was the one with four nodes in the hidden
layer, whose training record is shown in Figure 4.3. Figure 4.4 shows the error
terms of the various models in-sample. This includes not only the validation but
also the training set, i.e. the complete in-sample data set. While the mean error
showed in the plot was lower, this was the result of the network’s tendency to
overfit and the error in the validation set was slightly higher than the next smaller
22
Plots in this chapter typically treat the lower and upper 2.5 % value ranges as extreme
values, which are compressed (with markers) or removed (without markers). Plots showing the
full data set are so marked.
106 Chapter 4 Analysis of Data
design. As a result the network with four nodes was chosen. However, it is evident
that the differences between the network designs are very small.
An interesting observation can be made when examining the relationship be-
tween target values and network outputs. Figure 4.5 shows the two variables and
a regression line is included. Figure 4.6 shows only the central 95 % of the value
range (see Figures 4.7 and 4.8 show the same for the out-of-sample set).
In both instances, the network appears to average the forecast, overestimating
at low target values and underestimating at high values. Whether this is due to a
larger than expected number of outliers, whether this is a feature of the data set,
or whether it is the lack of additional explanatory variables is not the focus of this
research. It should be noted, however, that this behaviour may not be desirable
for all applications. The distribution of observations in Figures 4.7 and 4.8 is
particularly interesting as it not only shows this effect but it also appears to show a
distictive shape. It may thus be possible to improve performance by transforming
data or changing the modelling parameters. Since the use of the out-of-sample set
for modelling decisions would change its character as an out-of-sample set, this
line of research was not pursued further. It may be of general interest for future
research, however.
108 Chapter 4 Analysis of Data
Figure 4.5: Target and Output Values for the 𝜎ANNd In-sample Data
4.3 Volatility Forecast Evaluation 109
Figure 4.6: Target and Output Values for the 𝜎ANNd In-sample Data (Without
Outliers)
Figure 4.7: Target and Output Values for the 𝜎ANNd Out-of-sample Data
110 Chapter 4 Analysis of Data
Figure 4.8: Target and Output Values for the 𝜎ANNd Out-of-sample Data (With-
out Outliers)
The first hypothesis suggest that ANNs can forecast volatility more accurately.
This appears to be the case for the combined data set (all securities) in the
in-sample period as shown in Table 4.8. The ANN shows lower errors regardless
of the measure. However, the results of the out-of-sample period (Table 4.9) are
not consistent with this view. The network either overfits the data or a structural
shift caused a change in model ranks. Two other aspects are noteworthy. Firstly,
the GARCH model is among the worst of those tested regardless of the error
measure and period, which is broadly consistent with past literature. Secondly,
the out-of-sample period favours the model with the shorter lookback period,
which could indicate a more unstable environment.
Source SS df MS F p
Groups 22.73 3 7.5761 442.5410 4.9288 × 10−286
Error 2138.94 124 941 0.0171
Total 2161.67 124 944
To determine the statistical significance of these results, Table 4.10, reports the
ANOVA results. The difference between at least one pair is significant and Table
4.11 shows the pair-wise model comparisons indicating significant rows. The table
shows the lower end of the confidence interval, the mean of group differences, and
the upper end of the confidence interval. The null hypothesis of no difference, i.e. a
difference of 0, cannot be rejected when zero is inside the confidence interval. The
tests show that only the combination of 𝜎HVS and the ANN are not significantly
different at the 5 % level. The differences are very small, however, from a practical
perspective, which is illustrated in Figure 4.9. The large sample size allows for
the findings of relatively minor effects. It is important to note, in particular when
ranking the models, that the above tables use the mean squared error while the
ANOVA tables and box plots use the observed forecasting (and later pricing)
errors, their means, and the difference between their means. The MSE is useful
as it does not allow for one error to reduce the impact of another in the opposite
direction and is thus used for network training and model estimation. That is also
the reason it was used as the performance measure during development. However,
the question of the following analysis is what the characteristics of an average
model prediction are.
The differences are larger showing significance between all groups in the out-
of-sample period. Tables 4.12 and 4.13 report the equivalent statistics for this
subset, and a visual comparison in Figure 4.10.
112 Chapter 4 Analysis of Data
Source SS df MS F p
Groups 42.37 3 14.1234 546.6610 0
Error 1788.89 69 241 0.0258
Total 1831.26 69 244
Based on the in-sample results, one would have used the ANN and in this
regard the hypothesis was correct. It does not hold, however, when tested in a
new set and in changed circumstances. This raises the question of stability of
the models generally. When comparing the in-sample and out-of-sample errors
for each model, the user of a model would prefer to see no significant difference
between them, i.e. the model applies equally well in either period.
The more complex models, 𝜎GARCH (Table 4.16 and Figure 4.13) and 𝜎ANNd
(Table 4.17 and Figure 4.14), show a significant difference between in-sample and
out-of sample errors. The same is not true for the simple models 𝜎HVL (Table 4.14
and Figure 4.11) and 𝜎HVS (Table 4.15 and Figure 4.12), which suggests that their
forecasting characteristics have not changed significantly in the transition from
the in-sample to the out-of-sample set. All models show a much wider range of
forecasting errors in the out-of-sample set, which was expected considering the
period it covered as well as the nature of any out-of-sample testing.
4.3 Volatility Forecast Evaluation 115
Source SS df MS F p
Groups 0.00 1 0.0012 0.0630 0.8019
Error 953.63 48 465 0.0197
Total 953.63 48 466
Source SS df MS F p
Groups 0.01 1 0.0111 0.5313 0.4661
Error 1020.95 48 645 0.0210
Total 1020.96 48 646
Source SS df MS F p
Groups 32.03 1 32.0283 1408.1100 8.0446 × 10−304
Error 1107.33 48 683 0.0227
Total 1139.36 48 684
Source SS df MS F p
Groups 35.04 1 35.0449 2004.6300 0
Error 845.94 48 389 0.0175
Total 880.98 48 390
Value SE t p
(Intercept) 3.0965 0.0274 113.01 0
𝑀 −5.6255 0.0523 −107.52 0
𝑇 0.9824 0.0298 32.94 2.1668 × 10−235
𝑀𝑇 −0.7727 0.0282 −27.43 2.2345 × 10−164
𝑀2 3.6312 0.0288 126.29 0
𝑇2 −0.0282 0.0029 −9.64 5.6076 × 10−22
Value SE t p
(Intercept) 0.4480 0.0370 12.10 1.1325 × 10−33
𝑀 −0.6262 0.0707 −8.86 8.2597 × 10−19
𝑇 1.2770 0.0403 31.69 3.0362 × 10−218
𝑀𝑇 −1.0653 0.0381 −27.99 4.9112 × 10−171
𝑀2 1.3446 0.0388 34.62 2.0880 × 10−259
𝑇2 −0.0329 0.0040 −8.33 8.1324 × 10−17
Value SE t p
(Intercept) 5.1333 0.0247 207.94 0
𝑀 −9.5323 0.0471 −202.22 0
𝑇 0.5940 0.0269 22.11 8.9661 × 10−108
𝑀𝑇 −0.5157 0.0254 −20.32 2.2516 × 10−91
𝑀2 5.4243 0.0259 209.39 0
𝑇2 0.0018 0.0026 0.70 0.4846
buyer since it would be cheaper to buy the underlying security rather than buy-
ing the option for the same price and having to pay for the underlying again later.
This removed 20 in-sample observations, and 45 out-of-sample, respectively from
each model. After the removal of observations missing implied volatility, 48 846
in-sample records were applied and the models tested on 41 897 out-of-sample
records. ANNs models further remove incomplete observations, which result from
the lagged historical volatility time series.
Once the data was prepared, the model parameters were estimated. Tables 4.18
to 4.21 show the model parameters and probability values.
120 Chapter 4 Analysis of Data
Value SE t p
(Intercept) 5.4782 0.0215 255.39 0
𝑀 −10.3374 0.0410 −252.38 0
𝑇 0.7543 0.0233 32.30 1.4992 × 10−226
𝑀𝑇 −0.6066 0.0221 −27.50 2.8361 × 10−165
𝑀2 5.9548 0.0225 264.54 0
𝑇2 −0.0147 0.0023 −6.41 1.4416 × 10−10
Figures 4.15 to 4.18 show the fitted surface models. The range is chosen near
price-strike equality and for a relatively short time frame. The surface represents
correction factors to the underlying volatility forecast. Considering that same
implied volatility values were used in fitting all models, they show significant
differences in shape. It is important to note that the value of each of these surfaces
has to be considered in the context of the underlying volatility model. For this
reason, the evaluation includes volatility modelling errors, rather than errors in
factor levels.
4.4 Volatility Surface Fitting 121
Using the same implied volatility estimates as the target values, an ANN was
trained. The variable value ranges are shown in Figures 4.19 and 4.20. They show
similar characteristics of the input values except for the generally higher volatility
levels in the out-of-sample set (see above), and a number of outliers in the lower
range of the moneyness values. The network training terminated very early (see
Figure 4.22) when the network with four hidden nodes failed to improve results.
Thus the smallest network with three nodes was chosen. The training record is
shown in Figure 4.21, which shows the expected decrease in the error terms.
4.4 Volatility Surface Fitting 123
(In-sample)
4.4 Volatility Surface Fitting 125
The plots of actual compared to output values show a similar pattern as before.
Focusing on the central data range Figures 4.23 and 4.24 already show a greater
difference and tendency towards an average value. This further demonstrates
the need for additional research; the same argument against basing modelling
decisions on out-of-sample results applies, however.
Similar to volatility forecasting, the volatility surface models need to be com-
pared to each other and with respect to their ability to generalise beyond the
training set.
It is evident that the network models are preferable compared with the alter-
native models (see Tables 4.22 and tab:r:d:err:models:o:s), not only with respect
to their MSE and other metrics but also with regard to the location and dis-
tribution of the prediction error (see Tables 4.24, 4.25, 4.26, and 4.27). More
important than the fact that the volatility forecasting network rather than the
surface model appears strongest in the out-of-sample period is the fact that the
volatility forecasting model combined with a regression model offers multipliers
such that the errors of implied volatility are in a narrower range (see Figure 4.25).
This remains true even in the out-of-sample data set (see Figure 4.26.
Source SS df MS F p
Groups 4.93 4 1.2333 28.9261 4.4733 × 10−24
Error 10 407.30 244 104 0.0426
Total 10 412.20 244 108
Source SS df MS F p
Groups 287.81 4 71.9512 2387.0200 0
Error 6314.28 209 480 0.0301
Total 6602.09 209 484
As in the previous section, volatility surface models perform worse and signifi-
cantly differently out-of-sample. Evidence can be found in Figures 4.27 to 4.31,
as well as Tables 4.28 to 4.32. The difference between the two sample periods and
the sample size are together large enough to reject the null hypothesis that they
are the same across all models.
Source SS df MS F p
Groups 28.22 1 28.2238 710.7330 5.5602 × 10−156
Error 3601.85 90 702 0.0397
Total 3630.08 90 703
Source SS df MS F p
Groups 21.96 1 21.9621 439.6230 2.2289 × 10−97
Error 4532.93 90 737 0.0500
Total 4554.89 90 738
Source SS df MS F p
Groups 26.61 1 26.6070 633.1610 3.0924 × 10−139
Error 3813.16 90 741 0.0420
Total 3839.77 90 742
Source SS df MS F p
Groups 21.27 1 21.2677 625.2450 1.5860 × 10−137
Error 3085.23 90 702 0.0340
Total 3106.50 90 703
Source SS df MS F p
Groups 60.08 1 60.0767 3227.3600 0
Error 1688.40 90 702 0.0186
Total 1748.48 90 703
reference pricing model and the volatility surface model. Tables 4.33 and 4.34
summarise the error statistics.
Figure 4.38 reveals that it is a certain type of option that particularly affects
the results. These options largely characterised by their low option value relative
to their strike (the plots show the data after multiplying by the strike again)
and they are typically overpriced by the network; the effect is not visible in the
core set (see Figure 4.39) suggesting it is an outlier effect. This only applies to
the out-of-sample data (see Figure 4.38), however, and does not explain the poor
results more generally (see Figures 4.36 and 4.37 for comparison).
The remaining networks, which use the same input data do not show poor
performance. Rather, they perform very well in both subsets. It is noteworthy,
however, that all models perform much worse out-of-sample.
138 Chapter 4 Analysis of Data
Figure 4.36: Target and Output Values for the 𝐶 ANN In-sample Data
Figure 4.37: Target and Output Values for the 𝐶 ANN In-sample Data (Without
Outliers)
4.5 Option Pricing Evaluation 139
Figure 4.38: Target and Output Values for the 𝐶 ANN Out-of-sample Data
Figure 4.39: Target and Output Values for the 𝐶 ANN Out-of-sample Data (With-
out Outliers)
140 Chapter 4 Analysis of Data
Source SS df MS F p
Groups 108.02 5 21.6035 32.4450 3.3945 × 10−33
Error 193 677 290 873 0.6658
Total 193 785 290 878
Source SS df MS F p
Groups 30 478.10 5 6095.61 1258.1000 0
Error 1.23 × 106 254 700 4.85
Total 1.26 × 106 254 705
As in the previous stages, here too the comparison of errors reveals significant
difference, again in part due to the large number of observations. Comparisons
across models (see Figures 4.40 and 4.41) and across subsets (see Figures 4.42
to 4.47) show that models have significant differences and behave differently in the
later period (out-of-sample). Of the models evaluated, the 𝐶 ANNd model performs
best and shows the lowest inter-quartile range.
Unlike in the previous steps, the pair-wise comparison results are more mixed.
While the 𝐶 ANNd and 𝐶 ANNs models are better, their means are not significantly
different from that of the 𝐶 GARCH model in-sample (see Table 4.36) and the best
performing model 𝐶 ANNd is not significantly different from it in the out-of-sample
period (see Table 4.38) despite the lower error forecasting errors (see Table 4.34).
Both outperform the remaining choices, however, and are significantly different.
This and the narrower range of errors (ignoring outliers), suggests 𝐶 ANNd is the
preferred model followed by 𝐶 ANNs .
Similar to the previous forecasting and surface models, the difference between
the in-sample and out-of-sample period are quite large and significant. Tables 4.39
to 4.44 show very high levels of significance, i.e. very low p-values. This is also
evident in the corresponding box plots in Figures 4.42 to 4.47. Any performance
one would have expected based on the in-sample period, would not have been
4.5 Option Pricing Evaluation 143
Source SS df MS F p
Groups 689.55 1 689.5450 1923.7900 0
Error 32 054.40 89 430 0.3584
Total 32 744 89 431
Source SS df MS F p
Groups 566.24 1 566.2370 1160.8900 8.1147 × 10−253
Error 43 619.80 89 429 0.4878
Total 44 186.10 89 430
Source SS df MS F p
Groups 91.90 1 91.9005 266.7760 6.9889 × 10−60
Error 30 807.30 89 430 0.3445
Total 30 899.20 89 431
Source SS df MS F p
Groups 131.70 1 131.6970 659.6120 6.1125 × 10−145
Error 17 846.40 89 385 0.1997
Total 17 978.10 89 386
Source SS df MS F p
Groups 477.27 1 477.2740 1942.0600 0
Error 21 976.50 89 424 0.2458
Total 22 453.80 89 425
Source SS df MS F p
Groups 18 441.20 1 18 441.20 1417.1700 6.1278 × 10−308
Error 1.28 × 106 98 475 13.01
Total 1.30 × 106 98 476
Conclusion
The evidence presented in the previous chapter offers some support for this
hypothesis. It is certainly possible for volatility forecasting to benefit from non-
parametric models. The size of the contribution will depend on the nature of the
time series. A judgement needs to be made by the user of the model as to whether
the benefits of fitting justify the loss in explanatory power of the model compared
to the close-form solutions available through the models of the ARCH-family.
It should be noted that this conclusion is reached in the context of the modelling
limitations discussed previously. In particular, additional explanatory variables
may help as may rolling-fit (see below for a more detailed discussion).
In regards to the option pricing, the results are clearer. Option pricing does
benefit from the use of non-parametric cases even in the case of Australian equity
152 Chapter 5 Conclusion
length of the trading day and possibly of one week. The question remains whether
there are equally useful if not equally clear window sizes at higher frequencies.
The broader question therefore is whether and where market microstructure
effects end, what sampling frequencies are typically used in which areas of re-
search within finance and how these can be accounted and corrected for. While
such research exists for individual asset classes and individual markets, a broader
theoretical framework would be desirable. The same applies to the required sam-
ple sizes when a continuous time series is not strictly needed. While it is generally
preferable to use more rather than less data, there are at least practical limits
to this. In addition, inference is difficult for extremely large data sets, where any
difference (random or not) appears significant at reasonable significance levels.
Despite extensive research in machine learning, there still appears to be no
consensus about the various choices and strategies used to select size and ar-
chitecture, parameters, learning and evaluation functions, and the meta-learning
strategy for ANNs. A decision support system, or a process that results in such a
system if these are domain-specific, requires additional research. This is particu-
larly true for the meta-learning strategy, i.e. how to select a starting architecture,
how to modify it and when to stop. Individual approaches exist such as the one
employed in this thesis but also some for self-configuring networks. It is possible
that a different network architecture and a different treatment of data results in
a pricing network outperforming the volatility forecasting network. A systematic
approach to modelling problems in this context is needed to address any issues
that arise but also to rule out the possibility of incorrect network architecture in
any particular case.
Additional research appears to be needed in this area despite those, i.e. research
beyond an extensive review of the literature. Even if self-configuring networks are
an option for a particular problem, the question remains whether any of those
approaches can be generalised to option pricing and if so, how the most suitable
method is to be determined in the context of option pricing and the presence of
time-varying parameters.
Furthermore, it is not clear what the limits of machine learning are. Can all
structural and functional relationships inherently be modelled and what are the
requirements for the process and model.
In addition to these questions, which are intended to broaden the perspective
of the researcher, extending it beyond the question being researched, there are
several problems and research questions that relate more narrowly to the re-
156 Chapter 5 Conclusion
was equal. ARCH-style models with exogenous variables are certainly one way
to make such a comparison in addition to simple regression models using histori-
cal volatility and a number of predictors. Equally, an extension of past volatility
research into autoregressive neural networks into option pricing could lead to
interesting results.
Finally, it would be of considerable interest – though somewhat related to the
question about meta-learning strategies – if other machine learning techniques
such as SVM regression outperform ANNs in this particular domain and whether
they are easier to fit to existing volatility and option data.
Bibliography
Ahn, J. J., D. H. Kim, K. J. Oh, and T. Y. Kim (2012). “Applying option Greeks
to directional forecasting of implied volatility in the options market: An in-
telligent approach”. In: Expert Systems With Applications 39.10, p. 9315.
Amilon, H. (July 2003). “A Neural Network Versus Black-Scholes: A Compari-
son of Pricing and Hedging Performances”. In: Journal of Forecasting 22.4,
pp. 317–335.
Amornwattana, S., D. Enke, and C. H. Dagli (Oct. 2007). “A hybrid option pricing
model using a neural network for estimating volatility”. In: International
Journal of General Systems 36.5, pp. 558–573.
Anders, U., O. Korn, and C. Schmitt (1998). “Improving the Pricing of Options:
A Neural Network Approach”. In: Journal of Forecasting 17.5-6, pp. 369–388.
Andersen, T. G., T. Bollerslev, and S. Lange (1999). “Forecasting Financial Mar-
ket Volatility: Sample Frequency vis-à-vis Forecast Horizon”. In: Journal of
Empirical Finance 6.5, pp. 457–477.
Andreou, P. C., C. Charalambous, and S. H. Martzoukos (2008). “Pricing and
trading European options by combining artificial neural networks and para-
metric models with implied parameters”. In: European Journal of Operational
Research 185.3, pp. 1415–1433.
Andreou, P. C., C. Charalambous, and S. H. Martzoukos (2009). “European Op-
tion Pricing by Using the Support Vector Regression Approach”. In: Artificial
Neural Networks – ICANN 2009: 19th International Conference, Limassol,
Cyprus, September 14–17, 2009, Proceedings. Ed. by C. Alippi, M. M. Poly-
carpou, and C. Panayiotou. Berlin: Springer-Verlag, pp. 874–883.
Andreou, P. C., C. Charalambous, and S. H. Martzoukos (2010). “Generalized
parameter functions for option pricing”. In: Journal of Banking and Finance
34.3, pp. 633–646.
Andreou, P., C. Charalambous, and S. Martzoukos (2002). “Critical assessment of
option pricing methods using artificial neural networks”. In: Artificial Neural
160 Bibliography
Chen, S., W. K. Härdle, and K. Jeong (2010). “Forecasting Volatility with Sup-
port Vector Machine-Based GARCH Model”. In: Journal of Forecasting 29,
pp. 406–433.
Choi, H., H. Lee, G. Han, and J. Lee (2004). “Efficient option pricing via a globally
regularized neural network”. In: Advances in Neural Networks - ISNN 2004,
Part 2. Ed. by F. Yin, J. Wang, and C. Guo. Vol. 3174. Lecture Notes in
Computer Science. Berlin: Springer-Verlag, pp. 988–993.
Corrado, C. J. and T. Su (1996). “Skewness and kurtosis in S&P 500 index returns
implied by option prices”. In: Journal of Financial Research 19.2, pp. 175–
192.
Cox, J. (1975). Notes on Option Pricing IL Constant Elasticity of Variance Diffu-
sions. Working Paper. Stanford University.
Cox, J. C., S. A. Ross, and M. Rubinstein (1979). “Option Pricing: A Simplified
Approach”. In: Journal of Financial Economics 7, pp. 229–263.
Cybenko, G. (1989). “Approximation by superposition of a sigmoidal Function”.
In: Mathematics of Control, Signals, and Systems 2, pp. 303–314.
Dash, G. H., C. R. Hanumara, and N. Kajiji (2003). “Neural network architec-
tures for efficient modeling of FX futures options volatility”. In: Operational
Research 3.1, pp. 3–23.
Derman, E. and I. Kani (1994). “Riding on a Smile”. In: Risk Magazine 7.2.
Dindar, Z. and T. Marwala (2004). “Option pricing using a committee of neural
networks and optimized networks”. In: 2004 IEEE International Conference
on Systems, Man and Cybernetics. Vol. 1, pp. 434–438.
Dindar, Z. A. (2004). “Artificial Neural Networks Applied to Option Pricing”.
MA thesis. Johannesburg: Faculty of Engineering and the Built Environment,
University of the Witwatersrand.
Donaldson, R. G. and M. Kamstra (1997). “An artificial neural network-GARCH
model for international stock return volatility”. In: Journal of Empirical Fi-
nance 4.1, pp. 17–46.
Drost, F. C. and T. E. Nijman (1993). “Temporal Aggregation of GARCH Pro-
cesses”. In: Econometrica 61.4, pp. 909–937.
Dugas, C., Y. Bengio, F. Bélisle, C. Nadeau, and R. Garcia (2001). “Incorporating
Second-Order Functional Knowledge for Better Option Pricing”. In: Advances
in Neural Information Processing Systems 13. Ed. by T. Leen, T. Dietterich,
and V. Tresp. MIT Press.
Bibliography 163
Haug, E. G. and J. Haug (1996). Implied Forward Volatility. Third Nordic Sym-
posium on Contingent Claims Analysis in Finance, Iceland.
Haug, E. G., J. Haug, and A. Lewis (Sept. 2003). “Back to Basics A New Approach
to the Discrete Dividend Problem”. In: Wilmott Magazine.
Haug, E. G. (2007). The Complete Guide to Option Pricing Formulas. New York:
McGraw-Hill.
Healy, J. V., M. Dixon, B. J. Read, and F. F. Cai (2004). “Confidence limits for
data mining models of options prices”. In: Physica A: Statistical Mechanics
and its Applications 344.1, pp. 162–167.
Healy, J., M. Dixon, B. Read, and F. F. Cai (2002). “A data-centric approach
to understanding the pricing of financial options”. In: The European Physical
Journal B 27.2, pp. 219–227.
Healy, J. V., M. Dixon, B. J. Read, and F. F. Cai (2007). “Non-parametric extrac-
tion of implied asset price distributions”. In: Physica A: Statistical Mechanics
and its Applications 382.1, pp. 121–128.
Healy, J., M. Dixon, B. Read, and F. Cai (2003). “Confidence in data mining
model predictions: a financial engineering application”. In: Industrial Elec-
tronics Society, 2003. IECON’03. The 29th Annual Conference of the IEEE.
Vol. 2.
Herrmann, R. and A. Narr (1997). Neural Networks and the Valuation of
Derivatives–Some Insights into the implied Pricing Mechanism of German
Stock Index Options. Discussion Paper. Germany: University of Karlsruhe.
Heston, S. (1993). “A closed-form solution for options with stochastic volatil-
ity, with application to bond and currency options”. In: Review of Financial
Studies 6, pp. 327–343.
Hornik, K., M. Stinchcombe, and H. White (1989). “Multilayer feedforward net-
works are universal approximators”. In: Neural Networks 2, pp. 359–366.
Hornik, K., M. Stinchcombe, and H. White (1990). “Universal approximation of
an unknown mapping and its derivatives using multilayer feedforward net-
works”. In: Neural Networks 3, pp. 551–560.
Hossain, A. and M. Nasser (2011). “Recurrent Support and Relevance Vector
Machines Based Model with Application to Forecasting Volatility of Financial
Returns”. In: Journal of Intelligent Learning Systems and Applications 3,
pp. 230–241.
166 Bibliography
Kelly, D. L. (1994). Valuing and Hedging American Put Options Using Neural
Networks. Working Paper. Carnie Mellon University and University of Cali-
fornia Santa Barbara CA.
Kim, B., D. Lee, and J. Lee (2006). “Local Volatility Function Approximation
Using Reconstructed Radial Basis Function Networks”. In: Advances in Neu-
ral Networks: Third International Symposium on Neural Networks, Chengdu,
China, May 28 – June 1, 2006, Proceedings, Part III. Ed. by J. Wang, Z. Yi,
J. M. Zurada, B.-L. Lu, and H. Yin. Vol. 3973. Lecture Notes in Computer
Science. Springer, p. 524.
Ko, P.-C. (2009). “Option valuation based on the neural regression model”. In:
Expert Systems with Applications 36.1, pp. 464–471.
Kolmogorov, A. N. (1957). “On the Representation of Continuous Functions of
Several Variables by Superposition of Continuous Functions of one Variable
and Addition”. In: Doklady Akademii Nauk USSR 114, pp. 679–681.
Kuo, C. K. (1993). “The Valuation of Futures-Style Options”. In: The Review of
Futures Markets 10.3.
Lachtermacher, G. and L. A. Rodrigues Gaspar (1996). “Neural Networks in
Derivative Securities Pricing Forecasting in Brazilian Capital Markets”. In:
Neural Networks in Financial Engineering: Proceedings of the Third Inter-
national Conference on Neural Networks in the Capital Markets, London,
England, 11-13 October 1995. Singapore: World Scientific, pp. 92–97.
Lajbcygier, P. (2003a). “Comparing Conventional and Artificial Neural Network
Models for the Pricing of Options”. In: Neural Networks in Business: Tech-
niques and Applications. Ed. by K. Smith and J. Gupta. Idea Group Publish-
ing, pp. 220–235.
Lajbcygier, P. (2003b). “Option pricing with the product constrained hy-
brid neural network”. In: Artificial Neural Networks and Neural Informa-
tion Processing – ICANN/ICONIP 2003, Joint International Conference
ICANN/ICONIP 2003, Istanbul, Turkey, June 26–29, 2003, Proceedings.
Vol. 2714. Berlin: Springer-Verlag, pp. 615–621.
Lajbcygier, P. and A. Flitman (1996). “Comparison of Non-parametric Regres-
sion Techniques for the Pricing of Options Using an Implied Volatility”. In:
Decision Technologies for Financial Engineering: Proceedings of the Fourth
International Conference on Neural Networks in Capital Markets. Ed. by A. S.
Weigend, Y. Abu-Mostafa, and A.-P. N. Refenes. New York: World Scientific.
168 Bibliography
Qi, M. and G. Maddala (1996). “Option pricing using artificial neural networks:
The case of S&P 500 index call options”. In: Neural Networks in Financial
Engineering: Proceedings of the Third International Conference on Neural
Networks in the Capital Markets, London, England, 11-13 October 1995. Sin-
gapore: World Scientific, pp. 78–91.
Quek, C., M. Pasquier, and N. Kumar (2008). “A novel recurrent neural network-
based prediction system for option trading and hedging”. In: Applied Intelli-
gence 29.2, pp. 138–151.
Raberto, M., G. Cuniberti, E. Scalas, M. Riani, F. Mainardi, and G. Servizi
(2000). “Learning short-option valuation in the presence of rare events”. In:
International Journal of Theoretical and Applied Finance 3.3, pp. 563–564.
Rendleman, R. J. and B. J. Bartter (1979). “Two-State Option Pricing”. In:
Journal of Finance 34, pp. 1093–1110.
Roberts, H. (May 1967). Statistical versus Clinical Prediction of the Stock Market.
Manuscript. Center for Research in Security Prices, University of Chicago.
Rubinstein, M. (1994). “Implied Binomial Trees”. In: Journal of Finance 49,
pp. 771–818.
Saxena, A. (2008). Valuation of S&P CNX options: comparison of Black-Scholes
and hybrid ANN model. Paper 162–2008. Presentation at the SAS Global
Forum 2008.
Schittenkopf, C. and G. Dorffner (2001). “Risk-neutral Density Extraction from
Option Prices: Improved Pricing with Mixture Density Networks”. In: IEEE
Transactions on Neural Networks 12.4, pp. 716–725.
Securities Industry Research Centre of Asia-Pacific (SIRCA) (2010–2012).
Standard and Poor’s (July 2011). S&P/ASX Australian Indices Methodology.
Teddy, S., E. Lai, and C. Quek (2006). “A Brain-Inspired Cerebellar Associative
Memory Approach to Option Pricing and Arbitrage Trading”. In: Neural
Information Processing: 13th International Conference, ICONIP 2006, Hong
Kong, China, October 3-6, 2006. Proceedings, Part III. Ed. by I. King, J.
Wang, L.-W. Chan, and D. Wang. Vol. 4234. Lecture Notes in Computer
Science. Springer, p. 370.
Teddy, S., E. Lai, and C. Quek (2008). “A cerebellar associative memory ap-
proach to option pricing and arbitrage trading”. In: Neurocomputing 71.16-18,
pp. 3303–3315.
Thomaidis, N. S., V. S. Tzastoudis, and G. D. Dounias (2007). “A comparison
of neural network model selection strategies for the pricing of S&P 500 stock
172 Bibliography
AMP Value SE t
ANZ Value SE t
BHP Value SE t
BXB Value SE t
CBA Value SE t
CGJ Value SE t
FGL Value SE t
MQG Value SE t
NAB Value SE t
QBE Value SE t
RIN Value SE t
RIO Value SE t
SGB Value SE t
SUN Value SE t
TLS Value SE t
WBC Value SE t
WDC Value SE t
WES Value SE t
WOW Value SE t
WPL Value SE t