IFC Working Papers: Irving Fisher Committee On Central Bank Statistics
IFC Working Papers: Irving Fisher Committee On Central Bank Statistics
IFC Working Papers: Irving Fisher Committee On Central Bank Statistics
February 2022
IFC Working Papers are written by the staff of member institutions of the Irving Fisher
Committee on Central Bank Statistics, and from time to time by, or in cooperation
with, economists and statisticians from other institutions. The views expressed in
them are those of their authors and not necessarily the views of the IFC, its member
institutions or the Bank for International Settlements.
© Bank for International Settlements 2022. All rights reserved. Brief excerpts may be
reproduced or translated provided the source is stated.
Contents
Abstract ....................................................................................................................................................... 2
1. Introduction ....................................................................................................................................... 3
4. What are the main challenges in the use of big data? ..................................................... 9
6. Conclusion ........................................................................................................................................ 14
References ................................................................................................................................................ 15
1
Respectively Senior Financial Market Analyst (Giulio.Cornelli@bis.org); Economist
(Sebastian.Doerr@bis.org); Head of Innovation and Digital Economy
(Leonardo.Gambacorta@bis.org); Head of Statistics and Research Support, Bank for International
Settlements (BIS) and Head of the Secretariat of the Irving Fisher Committee on Central Bank Statistics
(IFC) (bruno.tissot@bis.org).
We would like to thank Jose Maria Serena and Fernando Perez Cruz for their advice and input. For
comments and suggestions, we also thank Redentor Paolo Alegre Jr, Gianni Amisano, Douglas Araujo,
Claudio Borio, Agustin Carstens, Stijn Claessens, Jon Frost, Michel Juillard, Julian Langer, Juri Marcucci,
Li Ming, Kuniko Moriya, Luiz Awazu Pereira, Rafael Schmidt, Hyun Song Shin and Helio Vale. The
views expressed are those of the authors and not necessarily those of the BIS or the IFC.
This paper reviews the use of big data in Asian central banks, leveraging on a survey
conducted among the members of the Irving Fisher Committee. The analysis reveals
four main insights. First, Asian central banks define big data in a more encompassing
way that includes unstructured non-traditional as well as structured data sets. Second,
interest in big data appears higher in Asia, including at the senior policy level; the
focus is in particular on projects developed to process natural language, conduct
nowcasting/monitoring exercises, and develop applications to extract economy
insights as well as suptech/regtech solutions. Third, Asian central banks report dealing
with big data to support a wide range of tasks. Fourth, big data poses new challenges,
with specific attention paid in the region to cyber security and data strategy. As a
result, there is a growing need for international policy cooperation, especially among
public authorities in Asia to facilitate the use of payments data and promote
innovative technological solutions.
Keywords: Asian central banks, artificial intelligence, big data, data science,
international cooperation
JEL codes: G17, G18, G23, G32
Big data sources are developing fast, and applications for making use of this new
information are flourishing in parallel. This trend, which is particularly pronounced in
Asia, primarily reflects the impact of digitalisation, with the development of the
“internet of things” and the ever-increasing ability to digitally process “traditional”
information, such as text. It is also a consequence of the large databases that have
been created as a by-product of the complex operations taking place in modern
societies. Additionally, vast amounts of data have emerged in the administrative,
commercial and financial realms, an evolution spurred by the important data
collection strategies undertaken after the great financial crisis of 2007–09 to address
the information challenges posed by developments in the financial sector. We now
live in the “age of big data” (Forbes (2012)).
Central banks are no exception to this general picture (Buch (2019)). They have
shown an increasing interest in using big data in recent years, as already documented
extensively by the Irving Fisher Committee (IFC) on Central Bank Statistics (IFC (2017),
Tissot (2017), Nymand-Andersen (2016), Mehrhoff (2019)). Central bank big data-
related work covers a variety of areas, including monetary policy and financial stability
as well as research and the production of official statistics. However, in contrast to the
rapid pace of innovation seen in the private sector, big data applications supporting
central banks’ operational work were developed only slowly initially. This tended to
reflect a number of constraints, such as a lack of adequate resources as well as the
intrinsic challenges associated with using big data sources to support public policy.
Yet, in recent years, central banks’ use of big data has proliferated, especially among
Asian countries.
Will central banks catch up and transform the way they operate to further benefit
from the information revolution? Or will their use of big data sources and applications
progress only gradually due to the inherent specificities of their mandates and
processes? To shed light on these issues, this paper reviews the use of big data and
machine learning in the Asian central bank community, leveraging on a survey
conducted in 2020 among the members of the IFC (IFC (2021a)). To this end, this
paper analyses the responses from seven Asian central banks, with a specific focus on
their reported big data projects (cf Appendix).2
The approach dealt with the following key questions: What constitutes big data
for central banks, and how strong is central banks’ interest in it? Have central banks
been increasing their use of big data and, if so, what were the main applications
developed? And finally, which constraints are faced when using big data and how can
they be overcome? To address these issues, Asian central banks’ answers to the 2021
survey were compared with those of their peers in the rest of the world.
This analysis uncovered four main insights.
First, Asian central banks have a comprehensive view of big data, which can
comprise very different types of data sets. First and foremost, it includes large
2
The list of Asian central banks includes: Bangko Sentral ng Pilipinas (BSP), Bank Indonesia (BI), Bank
of Japan (BoJ), Bank of Thailand (BoT), Bank Negara Malaysia (BNM), Monetary Authority of Macao
(MAM), Reserve Bank of India (RBI). Almost two thirds of the 92 IFC institutional members at that
time answered the survey. More information on the survey is contained in IFC (2021a) and Doerr et
al (2021).
The definition of big data is not unique, as it pertains to the specific angle of its use.
In general, big data can be defined in terms of volume, velocity and variety (the so-
called 3Vs). The reason is that for data to be “big”, they must not only have high
volume and high velocity, but also come in multiple varieties. Yet there are also many
different views on what defines “big data”. 3
In practice, big data can include the information generated from a wide variety
of sources, such as social media, web-based activities, machine sensors, or financial,
administrative or business operations. This comprehensive view of big data is
confirmed by the survey results for Asian central banks. Certainly, no central bank
considers traditional data alone as big data. But as reported in Graph 1, only 14% of
the respondents define big data exclusively as large non-traditional or unstructured
data that require new techniques for the analysis (in contrast, almost 40% of their
counterparts in the rest of the world have such a narrow definition). The remaining
86% of Asian respondents also include traditional and structured data sets in their
definition of big data. These structured data sets comprise those collected for
administrative or regulatory/supervisory purposes, often labelled as “financial big
data” (Cœuré (2017), Draghi (2018)).
Based on the results from the survey, a comprehensive definition of big data
would therefore cover all types of data sets that require non-standard technologies
to be analysed. The reason for this is, in part, that traditional statistical techniques
face hurdles when applied to unstructured data. For instance, to analyse handwritten
text, it must first be turned into structured data, as is done for instance with NLP
algorithms.
Hyperlink BIS
Central bank definitions of big data and main sources Graph 1
As a percentage of respondents
The sample includes 7 Asian central banks and 43 non-Asian central banks. Respondents could select multiple options. Non-traditional data
include “unstructured data sets that require new tools to clean and prepare”, “data sets with a large number of observations in the time
series”, “data sets that have not been part of your traditional pool”, and “data sets with a large number of observations in the cross-section”.
3
Occasionally, veracity is also added, as big data is often collected from open sources; moreover, the
literature is quite diverse and can refer to a much larger number of “Vs” (Tissot (2019)).
According to the 2020 IFC survey, central banks and supervisory authorities are
rapidly adopting big data and machine learning: the share of central banks
currently using big data has risen to 80% globally, up from just 30% in 2015. This
share has risen from 33% to 86% when looking specifically into Asia. Moreover,
around 60% of central banks in the region reported that they discuss big data issues
extensively, a ratio that is significantly above the one (42%) observed in the rest of
the world. Furthermore, all Asian respondents indicated a high to very-high level of
interest at the senior policy level, compared to only 58% outside the region.
Big data is used in a variety of areas, including research as well as monetary
policy and financial stability. Asian central banks (represented by the red bars in Graph
2) appear to use big data in most areas by more than their peers (blue bars), except
for research purposes. In particular, they process non-traditional data (darker bars) to
a greater extent to support monetary and financial stability policies – including for
specific supervisory and regulatory purposes (suptech and regtech).
In per cent
1
Respondents could select multiple options. See footnote to Graph 1 for details on how institutions define big data and on the sample
composition. 2 Includes “monitoring crypto assets”, “cyber security” and “network analysis”.
The big data projects undertaken by Asian central banks involve four main types
of applications: NLP, nowcasting exercises, applications to extract economy wide
insight from granular financial data and other non-traditional sources, and
suptech/regtech applications. A list of selected big data projects in Asian central
banks is provided in the Appendix.
A first type of application uses textual information through NLP. The goal is
generally to turn qualitative text-based intelligence into numerical format. One
example has been the computation of so-called economic policy uncertainty (EPU)
indices in India to assess the degree of uncertainty faced by economic agents
(Priyaranjan and Pratap (2020)). Such indices are basically constructed by setting up
dictionaries that allow for the definition of specific terms that refer to uncertainty, and
then searching them in the text considered (for instance in newspaper articles or on
internet sites). These selected terms are then counted and aggregated to provide a
synthetic index that reflects the degree of uncertainty displayed in the document of
interest. Sentiment indices can be computed in this way, eg in order to measure the
probability of the occurrence of financial instability episodes.
NLP is also helpful for policy evaluation. For instance, one can quantify the
monetary policy stance that is communicated to the public via the publication of
meeting minutes. Similarly, market expectations of interest rate decisions have been
assessed by analysing market commentaries ahead of policy meetings in Indonesia
(Andhika Zulen and Wibisono (2019)). Such exercises can be updated regularly, which
is a key advantage compared to more traditional surveys of market participants. The
information collected on economic agents’ expectations can be particularly useful
when future markets are not well developed, lack liquidity or are subject to
unexpected shocks (Amstad and Tuazon(2020); Armas et al (2020)). By contrast,
reported use of text data to inform financial stability policies has been relatively scarce
so far, although it appears to be developing as well. Other applications using text
analysis in Asian central banks have helped to: i) evaluate monetary policy credibility;
ii) ensure consistency in central banks’ communication of supervisory issues to
Hyperlink BIS
For what specific purposes does your institution use big data? Graph 3
Nowcasting Suptech
In per cent of respondents In per cent of respondents
Usually, these nowcasting exercises are frequently updated as new data come in,
and various techniques – eg Lasso (Least Absolute Shrinkage and Selection Operator)
– are applied to select the combination of variables that maximises the forecast at a
given point in time (Richardson et al (2019)). One advantage is that this approach
does not rely on specific relationships assumed ex ante (as is the case for bridge
4
Of course, economic agents adjust to new technologies. For example, Cao et al (2020) show that firms
are aware that their filings are parsed and processed for sentiment via machine learning.
Consequently, they avoid words that computational algorithms may perceive as negative. This could
bias any analysis based on these filings.
As noted above, central banks and supervisory authorities in Asia already use
extensively big data sources and analytics such as machine learning for research
purposes, namely to inform monetary policy decisions, facilitate their statistical
compilation work and support their regulatory and supervisory tasks. However, the
use of big data poses various challenges for them. Graph 4 shows that these issues
Hyperlink BIS
What is the focus of the discussions on big data within your institution? Graph 4
In per cent
1 2
Respondents could select multiple options. See footnote to Graph 1 for details on the sample composition. Includes “data quality and
reliability”, “data interpretation” and “data governance”.
More specifically, the survey has highlighted five main challenges for Asian
central banks in the use of big data. The first one is setting up a reliable and high-
powered IT infrastructure (IFC (2020)). Providing adequate computing power and
software involves high up-front costs. Many central banks have undertaken important
initiatives to develop big data platforms to facilitate the storage and processing of
large and complex data sets. One possible approach is represented by so-called data
lakes, obtained from pooling different data sets that are curated for future use. A
reliable and safe IT infrastructure is a prerequisite not only for big data analysis, but
also to prevent cyberattacks.
Second, central banks need to build up human capital to exploit big data.
Setting up and maintaining big data platforms requires a specific type of skillset,
combining statistical, IT, and analytical/mathematical aspects. Yet the supply of “data
scientist” is scarce and they are in high demand (Cœuré (2020)), in both the public
and the private sector. One solution is for central banks to train existing staff but
learning the new techniques that are needed can require significant time and effort.
In addition, experience shows that these skill adjustments should take place beyond
the operational level, eg the statisticians in charge of using advanced tools. Those
analysing the output of complex models must also have a good understanding of the
new techniques in order to ensure that big data predictions are not only accurate but
also representative and “interpretable” – so that specific explanatory causes or factors
can be identified and communicated for policy use. Another issue is attracting and
retaining talent, especially in the face of intense competition from the private sector,
5
IIF (2020) finds that there is no “one-size-fits-all” approach to machine learning governance, and
there are interesting regional differences, many of which can be attributable to existing non-
discrimination and data protection laws.
6
For instance, data on past loan applications could reflect any discriminatory decisions on the part of
loan officers vis-à-vis minorities or women (Angwin et al (2016), Ward-Foxton (2019)). Likewise,
unrepresentative data could lead an algorithm to wrongly infer attributes about underrepresented
segments of the population or perpetuate any previous biases.
Cooperation could foster central banks’ use of big data, in particular through
collecting and showcasing successful projects and facilitating the sharing of
experiences. For instance, developing technical discussions between institutions is
seen as a good way to build the necessary skillset among staff and develop relevant
IT tools and algorithms that are best suited to central banks’ needs.
Looking ahead, a promising area for collaboration among central banks in Asia
could be in global payments data. More than 85% of Asian central banks reported
an active use of high frequency payment data in their institutions, with a primary focus
on either the type of instruments, counterparties involved or both. This ratio is much
higher compared with other central banks in the rest of the world (about 65%; Graph
5). Moreover, all of Asian central banks expressed interest in contributing to a pilot
study on payment data (Graph 6, left-hand panel), especially to develop surveillance
exercises with a focus on interconnectedness in the financial system. This stands in
contrast to their counterparts in the rest of the world, where interest in using payment
data is primarily limited to nowcasting purposes (right-hand panel).
Hyperlink BIS
Which types of payments data are useful for your institution? Graph 5
In per cent
See footnote to Graph 1 for details on the sample composition (note: only 29 IFC members responded to this specific question); multiple
answers possible in each jurisdiction.
International financial institutions can foster cooperation around big data. For
instance, they can help develop in-house big data knowledge, helping to reduce
central banks’ reliance on big data services providers, which can be expensive and
entail significant legal and operational risks. Indeed, the IFC has been actively
supporting such exchange of experience at the global level, and several
complementary initiatives are being developed in the Asian region, for instance
among EMEAP central banks.7
International bodies can also facilitate innovation by promoting technological
solutions and initiatives to enhance the global statistical infrastructure. In this
regard, the BIS Innovation Hub has identified as strategic priorities, among others,
effective supervision (including regtech/suptech) and open banking/finance that
could benefit from drawing on big data sources and tools. It is currently developing
its work program in these fields, with a view to producing proofs of concept (PoC)
that can benefit the central banking community.
Initial projects in the field of particular relevance for Asian central banks
include Ellipse, led by the Singapore Centre of the Hub, and Genesis (Hong Kong
Centre). Ellipse is a PoC that aims to demonstrate the functionalities and feasibility of
an integrated regulatory data and analytics platform that can (i) reduce compliance
burdens placed on financial institutions by moving away from template-based
regulatory reporting requests; (ii) be nearer to "real-time" and relevant to current
events to support supervisory judgments and actions, both locally and globally; (iii)
support a move towards newer digitally enabled architectures to replace traditional
concepts and processes of data collection; and (iv) enable predictive insights and early
warning by integrating big data analytics. Turning to Genesis, this project explores
7
Executives’ Meeting of East Asia-Pacific Central Banks (EMEAP) is a co-operative forum of eleven
central banks and monetary authorities in the East Asia and Pacific region. Common projects are
developed in the areas of banking supervision and resolution, financial markets, payments and
market infrastructure, and information technology.
6. Conclusion
The world is changing and so is the way it is measured. This paper provides an
overview of the use of big data in the Asian central bank community. It leverages on
a survey conducted in 2020 among the members of the IFC. The specific responses
from seven Asian central banks were analysed and compared with those of other
central banks in the rest of the world. The overall picture suggests that, while central
banks in other regions see similar challenges and opportunities in the use of big data,
those located in Asia have very distinctive features.
First, Asian central banks define big data in an encompassing way that includes
not only unstructured, non-traditional data but also structured data sets to a larger
extent compared to other regions. Second, interest in big data appears higher in Asia,
including at the senior policy level. Third, a large majority of Asian central banks report
dealing with big data to support economic research, monetary and financial stability
policies as well as their statistical production tasks, a ratio that is slightly above the
situation reported in other regions. The related big data projects are developed
mainly in the areas of NLP, nowcasting, applications to extract economy wide insight,
and suptech/regtech solutions. Fourth, the advent of big data poses new challenges,
such as the reliability of IT infrastructures, legal aspects around privacy, algorithmic
fairness, and data quality. Interestingly, there is a somewhat higher interest among
Asian central banks for analysing these issues, with topics such as cyber security and
the development of a formal strategy for the use of big data being particularly high
on their agendas.
Asian (and other) central banks are willing to join forces to reap the benefits of
big data, the IFC survey shows. International financial institutions can support these
cooperative approaches.8 They can facilitate innovation by promoting technological
solutions to harmonise data standards and processes among jurisdictions, and
important projects have been already launched in Asia.
8
Specific initiatives to foster closer collaboration and accelerate innovation efforts include the ASEAN
Open Data Dictionary, ASEANstats, Asia Open data Partnership (Dataportal.Asia).
Amstad, M, G Cornelli, L Gambacorta and D Xia (2020): “Investors’ risk attitude in the
pandemic and the stock market: new evidence based on internet searches”, BIS
Bulletin, no 25.
Amstad, M, L Gambacorta, C He and D Xia (2021): “Trade sentiment and the stock
market: new evidence based on big data textual analysis of Chinese media”, BIS
Working Papers, no 917.
Andhika Zulen, A and O Wibisono (2019): “Measuring stakeholders’ expectation on
central bank’s policy rate”, IFC Bulletin, no 49.
Angwin, J, J Larson, S Mattu and L Kirchner (2016): “Machine bias”, ProPublica.
Armantier, O, S Doerr, J Frost, A Fuster and K Shue (2021): “Whom do consumers trust
with their data? US survey evidence”, BIS Bulletin, no 42.
Armas, J C A and P K A Tuazon (2020): “Revealing investors’ sentiment amid COVID-
19: the big data evidence based on internet searches”, BSP Working Paper Series, July.
Bailliu J, X Han, M Kruger, Y Liu and S Thanabalasingam (2019): “Can media and text
analytics provide insights into labor market conditions in China?”, IFC Bulletin, no 49.
Bank of Japan (2020): “Impact of COVID-19 on private consumption”, Outlook for
Economic Activity and Prices, July, Box 3.
Boissay, F, T Ehlers, L Gambacorta and H S Shin (2020): “Big techs in finance: on the
new nexus between data privacy and competition”, in: Rau R, Wardrop R and Zingales
L (eds), The Handbook of Technological Finance, Palgrave Macmillan.
Broeders, D and J Prenio (2018): “Innovative technology in financial supervision
(suptech) – the experience of early users”, FSI Insights, no 9.
Buch, C (2019): “Building pathways for policy making with big data”, welcoming
remarks at the International seminar on big data, IFC Bulletin, no 50.
Cao, S, W Jiang, B Yang and A Zhang (2020): “How to talk when a machine is listening:
corporate disclosure in the age of AI”, NBER Working Paper, no 27950.
Chansang, P (2019): “Data management in the data evolution era at Bank of Thailand”,
IFC Bulletin, no 53.
Chanthaphong, S and T Tassanoonthornwong (2021): “Workers’ mobility and Covid
19 pandemic: An analysis using mobile big data”, Bank of Thailand articles.
Chantharat, S, A Lamsam, K Samphantharak and P Tangsawadirat (2017): “A new
perspective on Thai household debt through credit bureaus' big data”, PIER discussion
paper, October.
Chen, S, S Doerr, J Frost, L Gambacorta and H S Shin (2021): “The fintech gender gap”,
BIS Working Papers, no 931.
Coelho, R, M De Simoni and J Prenio (2019): “Suptech applications for anti-money
laundering”, FSI Insights, no 18.
Cœuré, B (2017): “Policy analysis with big data”, speech at the conference on
Economic and Financial Regulation in the Era of Big Data, organised by the Bank of
France, Paris.
Bank Indonesia Indicator of job demand from Produce proxy indicator/nowcasting Hadoop, Hive, Spark,
Online job vacancy portals
online job vacancy portals employment Impala
Identification of main
Identify main counterparties in forex market Hadoop, Hive, Spark,
counterparties in forex RTGS
from payment system data Impala
market
Indicator of property prices Produce statistics for property prices Hadoop, Hive, Spark,
Online property portals
from online property portals in secondary market Impala
Indicator of automobile
Produce proxy indicator/nowcasting automobile Hadoop, Hive, Spark,
supply from online Online automobile portals
supply Impala
automobile portals
Interconnectedness of banks Identify core and periphery banks in payment Hadoop, Hive, Spark,
RTGS
in payment system system Impala
Interconnectedness of
Hadoop, Hive, Spark,
corporations in payment RTGS Identify network structure of corporations
Impala
system
Bank of Japan A Network Analysis of the Collected from financial Analyse the structure of the Japanese repo
R
Repo JGB Market institutions located in Japan market
Release of “Statistics on
Collected from financial Publish data on securities financing transactions
Securities Financing
institutions located in Japan in Japan
Transactions in Japan”
Corporate behavior and Japan's patent data provided by Analyse the effects of R&D investment on
R
innovation Panasonic system solutions productivity growth
Use text analytics to improve Comptroller General's Use text analytics to improve efficiency of
Python
operation Department statistics compilation
Bank Negara Credit modelling for retail Predict probability of default, loss given
Internal credit registry database R programming
Malaysia and non-retail borrowers default etc.
News monitoring and Enhance surveillance of topics of interest and Python, Django,
Public news sites
sentiment analysis dashboard understand public sentiment on these topics ElasticSearch etc.
Employee feedback text Internal talent management Analyse employees' key feedback from talent Python, Django,
analysis surveys management surveys HuggingFace
Python, Django,
Supervisory letter text Ensure consistency in the communication of
Internal data HuggingFace,
analysis supervisory issues to financial institutions
ElasticSearch