Data As A Force For Public Good
Data As A Force For Public Good
Data As A Force For Public Good
PT
2
E
Main messages
S
for many government functions. For that reason,
uppose a woman walks into a doctor’s office government agencies are the primary producers of
and is given a diagnosis without examination public intent data through censuses, surveys, and
by the doctor: no measurement of her heart administrative data, among other things. Citizens,
rate, no recording of her symptoms, and no review of civil society organizations (CSOs), nongovernmental
her medical history. The doctor just prescribes a med- organizations (NGOs), academic institutions, and
ication. Such an approach, and such a world in which international organizations also contribute critically
crucial data are not gathered, analyzed, and acted on, to the production of public intent data through sur-
would not be welcome, to say the least.1 veys, crowdsourcing platforms, and other means.
Yet all too often governments make decisions Data from firms can also be used for public policy—
affecting people’s well-being without understanding a topic that will be covered in chapter 4.4 This chapter
or even taking into account essential data. Designing distinguishes between six types of public intent data
policies without data is akin to a shot in the dark.2 that all serve the public good (box 2.1).
This problem is particularly acute in the poorest The discussion that follows uses country examples
countries, where gaps in both the availability and the to describe three important pathways through which
use of data are severest.3 public intent data can bring value to development by
Just as data gathered by a doctor can help improve (1) improving service delivery, (2) prioritizing scarce
a patient’s diagnosis and ultimate well-being, data resources, and (3) holding governments accountable
gathered by governments, international organiza- and empowering individuals. But these are not the
tions, research institutions, and civil society can only pathways. Others include regulating the econ-
improve societal well-being by enhancing service omy and markets, fostering public safety and secu-
delivery, prioritizing scarce resources, holding gov-
rity, and improving dispute or conflict resolution.
ernments accountable, and empowering individuals.
The country examples reveal several conditions
These data serve as the foundation for core functions
that should be in place to maximize the value of pub-
of governments and their endeavors to reduce pov-
lic intent data. The data need to be (1) produced with
erty. The data a doctor gathers often take the form of a
adequate spatial and temporal coverage (complete,
conversation or some other means of communicating
timely, and frequent); (2) high in quality (granular,
information between patient and doctor. In the same
accurate, and comparable); (3) easy to use (accessible,
way, data gathered with the intent of informing public
understandable, and interoperable); and (4) safe to
policy should enrich the policy dialogue and allow for
use (impartial, confidential, and appropriate)—see
systematic flows of information and communication
figure 2.1.5 With these features, development-related
among governments, their citizens, and commerce.
data have the potential to transform development out-
Such flows of information and communication
comes. For this potential to be realized, the data must
require long-term investments in statistical capacity,
be used explicitly to generate public good, including
infrastructure, data governance, data literacy, and
through the three pathways summarized in the fol-
data safeguards. These investments depend on one
lowing sections.
another. Failure in one area jeopardizes the value that
data bring to development. Too often these invest-
ments are not made in the poorest parts of the world, Pathway 1: Improving service delivery
contributing to data deprivations and poverty. Increasing access to government services. One of the
How should such deprivations be addressed? This fundamental ways in which public intent data can
chapter discusses the pathways through which data improve livelihoods is by increasing access to gov-
for public policy generate value for development, the ernment services. More access often requires data
obstacles to safe realization of value, and how those representative of all residents. Use of administrative
obstacles can be overcome. data, particularly foundational identification (ID)
systems such as national IDs and civil registries as
Public intent data and well as digital identification, ensures that all persons
are covered and access is equitable. In Thailand at the
development: Three pathways
turn of the century, only 71 percent of the population
for adding value was covered by a public health insurance scheme
Public intent data—data collected with the intent that was intended to be universal. Yet the country
of serving the public good by informing the design, had a near-universal foundational ID and population
Administrative data—such as birth, mar- citizens face.b Examples include HarassMap, an Egyptian
riage, and death records and data from tool that maps cases of sexual harassment based on
identification systems; population, health, citizen reports, and ForestWatchers, a platform through
education, and tax records; and trade which citizens monitor the deforestation of the Amazon.
flow data—are generated by a process of registration or
record keeping, usually by national authorities. Admin- By contrast, machine-generated data are
istrative data also include data used by governments to automatically generated by a sensor,
run projects, programs, and services. The digital revolu- application, or computer process without
tion has created new types of administrative data—for human interactions. An example is the
example, when education and health inspectors’ use of sensors that monitor air pollution. These data emerge
smartphone apps channels data to a central register. when devices are embedded with sensors and other
technologies, allowing them to transfer data with each
Censuses aim to systematically enumerate other, a system known as the Internet of Things.
and record information about an entire
population of interest, whether individ- Geospatial data relate multiple layers of
uals, businesses, farms, or others. Most information based on their geographic
prominently, population and housing censuses record locale. Public intent geospatial data
every person present or residing in a country and provide include satellite imagery of the Earth
essential information on the entire population and their such as that provided by the US National Aeronautics
key socioeconomic conditions. and Space Administration’s Landsat program and the
European Space Agency’s Copernicus program; weather
Sample surveys draw on a smaller, repre- data; and cadastral (property and land record) data.c
sentative sample of the entire population,
typically from censuses, to collect detailed These data types are neither exhaustive nor mutually
information more frequently. These sur- exclusive. For example, all data sources can be geo-
veys cover many domains such as household surveys, referenced and thus can be used in geospatial applica-
farm surveys, enterprise surveys, labor force surveys, tions, and some administrative data and geospatial data
and demographic and health surveys. Key official statis- can be machine-generated. Data sources are interoper-
tics, such as unemployment and national accounts, rely able when they can be linked across and within these
on survey data, often in combination with administrative types though common numeric identifiers for persons,
data and census data.a facilities, or firms; geospatial coordinates; time stamps;
and common classification standards.
Citizen-generated data are produced by
individuals, often to fill gaps in public and
a. Sample surveys also include the surveys that are implemented by social
private sector data or when the accuracy media companies and target a sample of users who are active on their
of existing data is in question. These data, platforms. Examples include the Future of Business and Gender Equality
at Home surveys conducted on the Facebook platform.
which can have an important monitoring and account- b. Meijer and Potjer (2018).
ability function, contribute to solving problems that c. Such data sources are discussed in greater detail in chapter 4.
Figure 2.1 Certain data features can maximize the value of public intent data
Ensuring the data have Ensuring the data Ensuring the data Ensuring the data
adequate coverage are of high quality are easy to use are safe to use
The World Bank’s Statistical Performance Indicators is a set of indicators to measure performance. The indi-
(SPI) measure statistical performance across 174 coun- cators provide a time series extending at least from 2016
tries.a The indicators are grouped into five pillars: (1) data to 2019 in all cases, with some indicators going back to
use, which captures the demand side of the statistical 2004. The data for the indicators are from a variety of
system; (2) data services, which looks at the interaction sources, including databases produced by the World
between data supply and demand such as the openness Bank, International Monetary Fund (IMF), United Nations
of data and quality of data releases; (3) data products, (UN), Partnership in Statistics for Development in the
which reviews whether countries report on important 21st Century (PARIS21), and Open Data Watch—and
indicators; (4) data sources, which assesses whether cen- in some cases, directly from national statistical office
suses, surveys, and other data sources are created; and websites. The indicators are also summarized as an index,
(5) data infrastructure, which captures whether founda- with scores ranging from a low of 0 to a high of 100.
tions such as financing, skills, and governance needed
for a strong statistical system are in place. Within each a. World Bank, Statistical Performance Indicators (database), http://www
pillar is a set of dimensions, and under each dimension .worldbank.org/spi; Dang et al. (2021a, 2021b).
the census, they can leave out some of the poorest and undercounting is difficult to measure systematically,
most vulnerable. Many vulnerable groups are hard but in 2013 it was estimated that globally between 170
to count in the first place, especially when census million and 320 million people were missing from
enumeration focuses on residence and the concept population census frames, with the poorest more
of the household. These groups include the displaced, likely to be missed.44 As noted, in many countries the
the homeless, slum inhabitants, nomads, migrants, census determines the allocation of resources and
young children, and the disabled.43 The extent of political representation. Thus these omissions have
real consequences and can disenfranchise vulnerable
populations.45 They also affect the representativeness
Figure 2.3 Gaps in geospatial datasets are especially of household surveys that use census-based sampling
large in lower-income countries frames.46
100
Lower-income countries also are susceptible to
coverage gaps in geospatial data, especially in some
Share of countries with dataset gaps (%)
The COVID-19 pandemic was not gender-blind; it affected Figure B2.3.1 Proportion of COVID-19
men and women differently and may have exacerbated cases reported with sex-disaggregated
gender inequalities.a Yet knowledge of the gender data for 190 countries
impacts of COVID-19 is incomplete because of data
100
gaps across all dimensions of well-being. At the most
basic level, data are lacking on COVID-19 infections and 75
deaths among men and women. In March 2020, only 61
Percent
ril
ay
ne
ly
st
er
r
be
be
gu
Ju
Ap
ob
ar
Ju
M
em
em
Au
M
ct
O
pt
ov
impacts extends well beyond case and mortality data.
Se
N
2020
The data systems in place prior to the pandemic had
notable gender data gaps that hampered the ability to Sex-disaggregated? No Yes
Proportion of countries reporting sex-disaggregated data
track impacts and inform policy. For example, monitor-
ing impacts on jobs requires regular and timely data Sources: Global Health 50/50, University College London, COVID-19
Sex-Disaggregated Data Tracker (database), November 30, 2020, data
on informal employment where women predominate. release, https://globalhealth5050.org/the-sex-gender-and-covid-19
However, only 41 percent of low-income countries (LICs) -project/; Global Change Data Lab, University of Oxford, Our World in
Data, Coronavirus Pandemic (COVID-19) (database), https://ourworld
and lower-middle-income countries (LMICs) report data indata.org/coronavirus; calculations of Open Data Watch, Washington,
on informal jobs disaggregated by sex. And in seven DC. Data at http://bit.do/WDR2021-Fig-B2_3_1.
of the 10 countries where the recent economic con-
traction is severest, less than 38 percent of Sustainable on gender differences in ownership of personal identity
Development Goal economic opportunity indicators are cards are missing for more than a third of countries. Less
available by sex.b Furthermore, preexisting biases in than a quarter of LICs and LMICs report data on mobile
face-to-face household survey design and implemen- phone ownership by women.c
tation bled into phone surveys implemented during the Even though the pandemic created new demands for
pandemic, limiting measurement of the gender-related statistics, it also interrupted the supply. More than half
impacts of the crisis. These biases include designing of LICs and LMICs reported that the COVID-19 pandemic
phone surveys aimed at household heads and lack of affected national statistical offices’ ability to produce
survey content on time use. socioeconomic statistics.d This problem requires imme-
There are also notable gaps in the gender data needed diate attention, but building effective, gender-aware
to inform policy design and effectiveness. Although the data systems will require sustained financial and human
expansion of social protection programs is arguably the capital investments.
largest policy response to offset the economic impacts
Sources: Mayra Buvinic (Center for Global Development), Lorenz Noe
of the crisis, comparable sex-disaggregated measures of (Data2x), and Eric Swanson (Open Data Watch), with inputs from the
social protection coverage are largely unavailable. Data WDR 2021 team.
on personal identification cards and mobile phone own- a. UN Women (2020).
b. Buvinic, Noe, and Swanson (2020).
ership should inform program design decisions, espe- c. Buvinic, Noe, and Swanson (2020).
cially as countries scale up digital platforms. Yet data d. UNSTATS and World Bank (2020).
80
Share of countries (%)
60
40
20
0
e
a
an
S
a
e
sia
ia
S
fic
e
om
-FC
FC
ric
ric
om
As
ci
be
lA
co
co
Af
Af
Pa
c
on
c
ib
th
in
-in
in
tra
-in
n
th
u
ar
N
e-
h-
d
le
ra
w
So
en
or
an
dl
ig
C
d
Lo
ha
N
id
id
C
H
he
ia
-m
r-m
Sa
d
d
As
t
an
an
er
b-
pe
an
w
st
Su
pe
st
Up
Ea
Lo
Ea
a
ro
ic
Eu
e
er
dl
Am
id
M
tin
La
unavailable to the Ministry of Health. In Ethiopia, a understandable, data must be well disseminated,
study of the health sector found 228 different digital backed up with sufficient metadata, responsive to
health information applications, of which only 39 per- user needs, and, for certain purposes, summarized
cent sent data to the Ministry of Health.61 Administra- and visualized for the user. A majority of countries
tive data, in particular, are too often siloed in different have data portals and provide metadata for their pub-
systems, prohibiting their effective use for monitoring lished data—practices that facilitate wider data use.63
and policy design. Although data coordination within Low-income countries perform comparatively well in
agencies is often limited, the challenge of siloed sys- the data portal and metadata categories, but even here
tems is even greater across government agencies.62 they lag. A larger gap remains in terms of advance
Lack of understandability prevents even those data release calendars, which commit government units
that are accessible from generating value. To be to release data on a predetermined timetable. Only
Figure 2.6 A positive feedback loop can connect enablers and features of public
intent data with greater development value
80
74
22
20
3
0 0
0
e
ia
c
e
an
sia
S
S
m
ifi
m
-FC
ric
ric
FC
As
co
co
co
be
lA
c
co
Af
Af
Pa
on
in
in
in
ib
h
-in
tra
ut
e-
e-
h-
th
ar
N
d
w
ra
So
en
dl
dl
ig
or
an
C
Lo
ha
id
id
C
e
ia
-m
r-m
Sa
th
d
d
As
er
an
an
d
pe
b-
w
an
st
Su
Up
pe
st
Lo
Ea
Ea
a
ro
ic
Eu
e
er
dl
Am
id
M
tin
La
Source: WDR 2021 team calculations, based on indicators collected by the Partnership in Statistics for Development in the 21st Century (PARIS21) that are also
used as Statistical Performance Indicators (World Bank, http://www.worldbank.org/spi). Data at http://bit.do/WDR2021-Fig-2_7.
Note: Having a fully funded national statistical plan under implementation is Sustainable Development Goal Indicator 17.18.3. FCS = fragile and conflict-
affected situations.
Figure 2.8 The older a country’s statistical laws, the lower is its statistical performance and the
less open are its data
a. Statistical performance b. Openness of data
90 90
80 80
Statistical Performance Index
70 70
ODIN overall score
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Age of statistical laws (years) Age of statistical laws (years)
High-income Upper-middle-income Lower-middle-income Low-income
Sources: WDR 2021 team, based on UNSTATS (Statistics Division, Department of Economic and Social Affairs, United Nations), UNSTATS (database), https://unstats.un.org/unsd/dnss/cp
/searchcp.aspx; Partnership in Statistics for Development in the 21st Century (PARIS21), https://paris21.org/knowledge-database?keyword=&type%5B%5D=Statistical-Legislation
-Country-Documents&date-from=&date-to=&page=; World Bank, World Development Indicators (database), https://databank.worldbank.org/source/world-development-indicators.
Data at http://bit.do/WDR2021-Fig-2_8.
Note: In panel a, the regression coefficient on age, controlling for GDP per capita, is –0.48, p < .01; in panel b, –0.39, p < .01. For the Statistical Performance Indicators, see World Bank,
Statistical Performance Indicators (database), http://www.worldbank.org/spi. For the Open Data Inventory (ODIN), see Open Data Watch, https://odin.opendatawatch.com/.
30 40
30
20
20
10
10
0 0
0 25 50 75 100 100 75 50 25 0
NSO independence score RSF World Press Freedom Index
Sources: NSO independence score: Mo Ibrahim Foundation, Ibrahim Index of African Governance (database), http://mo.ibrahim.foundation/iiag/; World Press Freedom Index: Reporters
Without Borders, 2020 World Press Freedom Index (database), https://rsf.org/en/ranking_table. Data at http://bit.do/WDR2021-Fig-2_9.
Note: The x’s represent countries. Panel a shows only African countries, and panel b shows all countries with data available. The NSO independence score ranges from 0 to 100. The
World Press Freedom Index ranges from 100 to 0—lower values imply greater press freedom. For the Statistical Performance Index, see World Bank, Statistical Performance Indicators
(database), http://www.worldbank.org/spi. NSO = national statistical office; RSF = Reporters Without Borders.
independence of producers of public intent data also with transparency and complete adherence to inter-
reinforces the credibility of and trust in the data and national principles.103
its producers, which encourages data use in both gov- A government’s interest in having an independent
ernment and civil society.100 national statistical system can be affected by several
An indicator capturing the independence of NSOs competing factors. On the one hand, a government
in all African nations is included in the Ibrahim Index may have a vested interest in curtailing statistical
of African Governance.101 The indicator measures the independence and the production and dissemination
institutional autonomy and financial independence of reliable data, fearing these could expose poor policy
of an NSO. A perfect score indicates that an NSO is decisions and performance, dilute power, and increase
able to publish data without clearance from another public scrutiny and pressure.104 In this case, lack of
government branch and has sufficient funding to do independence and the availability of reliable data
so. A higher score on the NSO independence indicator would make it harder to hold governments account-
is highly correlated with statistical performance as able.105 On the other hand, an independent statistical
captured by the World Bank’s SPI (figure 2.9, panel system producing reliable data in a transparent
a). In 2019 the average score on NSO independence fashion best informs government decision-making
was 34 out of 100, with low-income African countries and increases citizens’ trust in government data and
scoring below average. These findings illustrate public institutions in general.106 Such transparency
that NSO independence is precarious, particularly can also facilitate favorable capital market and invest-
in lower-income countries. Anecdotes of attacks on ment conditions and foster GDP growth.107 Finally,
NSO independence around the world suggest that international cooperation can boost statistical inde-
fragile NSO independence is not limited to the Afri- pendence and data transparency when adherence to
can context.102 For example, in 2007 the Argentine standards of data quality and the independence of
government began interfering with the independence their producers is required for accession to interna-
of Argentina’s NSO, the National Institute of Statistics tional organizations or agreements. An example is
and Censuses (INDEC). The effort initially focused Colombia’s successful bid to join the Organisation for
on the consumer price index and later expanded to Economic Co-operation and Development (OECD).108
other official statistics, casting doubt especially on Civil society performs a vital function in demand-
reported inflation statistics. Recognizing the harmful ing transparency and holding government account-
effects of these measures, by 2015 a new government able. Citizen-generated data can be used to challenge
had undertaken efforts to rebuild the institute, and official statistics when their accuracy or impartiality
INDEC resumed the delivery of trustworthy statistics are in question. A free and empowered press is a
Political commitment
Create a broad-based political and societal agreement on the value of high-quality public intent data
• Create a target fraction of • Ensure more competitive • Ensure that NSO • Build trust in integrity of
government spending or pay scales. independence is anchored official statistics via public
a line item in the national in laws and institutional release calendars and best
budget dedicated to the • Devote more time and
setup. practices in dissemination.
resources to building
NSOs
NSO.
capacity among staff. • Prevent statistical laws • Engage proactively with
• Engage recurrently with from becoming outdated. nongovernmental entities.
the Ministry of Finance to
understand and support
its data needs.
• Designate a budget line • Create technical units in • Assign clear roles, • Designate knowledge
Other government
for data in each ministry charge of data production mandates, and brokers in government
and agency. and use. responsibilities along the agencies to champion the
agencies
• Allocate resources to • Promote data literacy in • Ensure that laws and • Enable citizens to engage
Civil society and
citizen-generated data primary and secondary regulations facilitate the more easily with data
academia