Nothing Special   »   [go: up one dir, main page]

0% found this document useful (0 votes)
9 views20 pages

Data Management

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 20

CHAPTER 1

Data Management

1. Introduction
Many organizations recognize that their data is a vital enterprise
asset. Data and information can give them insight about their
customers, products, and services. It can help them innovate and
reach strategic goals. Despite that recognition, few organizations
actively manage data as an asset from which they can derive
ongoing value Evans and Price, . Deriving value from data
does not happen in a vacuum or by accident. It requires
intention, planning, coordination, and commitment. It requires
management and leadership.
Data Management is the development, execution, and supervision
of plans, policies, programs, and practices that deliver, control,
protect, and enhance the value of data and information assets
throughout their lifecycles.
“ Data Management Professional is any person who works in any
facet of data management from technical management of data
throughout its lifecycle to ensuring that data is properly utilized
and leveraged to meet strategic organizational goals. Data
management professionals fill numerous roles, from the highly
technical e.g., database administrators, network administrators,
programmers to strategic business e.g., Data Stewards, Data
Strategists, Chief Data Officers .
Data management activities are wide-ranging. They include
everything from the ability to make consistent decisions about
how to get strategic value from data to the technical deployment
and performance of databases. Thus data management requires
both technical and non-technical i.e., business skills.
Responsibility for managing data must be shared between
business and information technology roles, and people in both
areas must be able to collaborate to ensure an organization has
high quality data that meets its strategic needs.
Data and information are not just assets in the sense that
organizations invest in them in order to derive future value.
Data and information are also vital to the day-to-day operations
of most organizations. They have been called the currency , the
life blood , and even the new oil of the information economy.
Whether or not an organization gets value from its analytics, it
cannot even transact business without data.
To support the data management professionals who carry out
the work, D“M“ International The Data Management
“ssociation has produced this book, the second edition of The
DAMA Guide to the Data Management Body of Knowledge
DMBOK . This edition builds on the first one, published in
, which provided foundational knowledge on which to
build as the profession advanced and matured.
This chapter outlines a set of principles for data management. It
discusses challenges related to following those principles and
suggests approaches for meeting these challenges. The chapter
also describes the D“M“ Data Management Framework, which
provides the context for the work carried out by data
management professionals within various Data Management
Knowledge “reas.

1.1 Business Drivers


Information and knowledge hold the key to competitive
advantage. Organizations that have reliable, high quality data
about their customers, products, services, and operations can
make better decisions than those without data or with unreliable
data. Failure to manage data is similar to failure to manage
capital. It results in waste and lost opportunity. The primary
driver for data management is to enable organizations to get
value from their data assets, just as effective management of
financial and physical assets enables organizations to get value
from those assets.
1.2 Goals
Within an organization, data management goals include

Understanding and supporting the information needs


of the enterprise and its stakeholders, including
customers, employees, and business partners
Capturing, storing, protecting, and ensuring the
integrity of data assets
Ensuring the quality of data and information
Ensuring the privacy and confidentiality of stakeholder
data
Preventing unauthorized or inappropriate access,
manipulation, or use of data and information
Ensuring data can be used effectively to add value to
the enterprise

2. Essential Concepts

2.1 Data
Long-standing definitions of data emphasize its role in
representing facts about the world. In relation to information
technology, data is also understood as information that has been
stored in digital form though data is not limited to information
that has been digitized and data management principles apply
to data captured on paper as well as in databases . Still, because
today we can capture so much information electronically, we
call many things data that would not have been called data in
earlier times things like names, addresses, birthdates, what one
ate for dinner on Saturday, the most recent book one purchased.
Such facts about individual people can be aggregated, analyzed,
and used to make a profit, improve health, or influence public
policy. Moreover our technological capacity to measure a wide
range of events and activities from the repercussions of the ”ig
”ang to our own heartbeats and to collect, store, and analyze
electronic versions of things that were not previously thought of
as data videos, pictures, sound recordings, documents is close
to surpassing our ability to synthesize these data into usable
information. To take advantage of the variety of data without
being overwhelmed by its volume and velocity requires reliable,
extensible data management practices.
Most people assume that, because data represents facts, it is a
form of truth about the world and that the facts will fit together.
”ut facts are not always simple or straightforward. Data is a
means of representation. It stands for things other than itself
Chisholm, . Data is both an interpretation of the objects it
represents and an object that must be interpreted Sebastian-
Coleman, . This is another way of saying that we need
context for data to be meaningful. Context can be thought of as
data s representational system such a system includes a
common vocabulary and a set of relationships between
components. If we know the conventions of such a system, then
we can interpret the data within it. These conventions are often
documented in a specific kind of data referred to as Metadata.
However, because people often make different choices about
how to represent concepts, they create different ways of
representing the same concepts. From these choices, data takes
on different shapes. Think of the range of ways we have to
represent calendar dates, a concept about which there is an
agreed-to definition. Now consider more complex concepts
such as customer or product , where the granularity and level
of detail of what needs to be represented is not always self-
evident, and the process of representation grows more complex,
as does the process of managing that information over time. See
Chapter .
Even within a single organization, there are often multiple ways
of representing the same idea. Hence the need for Data
“rchitecture, modeling, governance, and stewardship, and
Metadata and Data Quality management, all of which help
people understand and use data. “cross organizations, the
problem of multiplicity multiplies. Hence the need for industry-
level data standards that can bring more consistency to data.
Organizations have always needed to manage their data, but
changes in technology have expanded the scope of this
management need as they have changed people s understanding
of what data is. These changes have enabled organizations to use
data in new ways to create products, share information, create
knowledge, and improve organizational success. ”ut the rapid
growth of technology and with it human capacity to produce,
capture, and mine data for meaning has intensified the need to
manage data effectively.

2.2 Data and Information


Much ink has been spilled over the relationship between data
and information. Data has been called the raw material of
information and information has been called data in context .
Often a layered pyramid is used to describe the relationship
between data at the base , information, knowledge, and
wisdom at the very top . While the pyramid can be helpful in
describing why data needs to be well-managed, this
representation presents several challenges for data management.

It is based on the assumption that data simply exists.


”ut data does not simply exist. Data has to be created.
”y describing a linear sequence from data through
wisdom, it fails to recognize that it takes knowledge to
create data in the first place.
It implies that data and information are separate
things, when in reality, the two concepts are
intertwined with and dependent on each other. Data is
a form of information and information is a form of
data.
Within an organization, it may be helpful to draw a line between
information and data for purposes of clear communication about
the requirements and expectations of different uses by different
stakeholders. Here is a sales report for the last quarter
[information]. It is based on data from our data warehouse
[data]. Next quarter these results [data] will be used to generate
our quarter-over-quarter performance measures [information] .
Recognizing data and information need to be prepared for
different purposes drives home a central tenet of data
management ”oth data and information need to be managed.
”oth will be of higher quality if they are managed together with
uses and customer requirements in mind. Throughout the
DM”OK, the terms will be used interchangeably.

2.3 Data as an Organizational Asset


“n asset is an economic resource, that can be owned or
controlled, and that holds or produces value. “ssets can be
converted to money. Data is widely recognized as an enterprise
asset, though understanding of what it means to manage data as
an asset is still evolving. In the early s, some organizations
found it questionable whether the value of goodwill should be
given a monetary value. Now, the value of goodwill commonly
shows up as an item on the Profit and Loss Statement P&L .
Similarly, while not universally adopted, monetization of data is
becoming increasingly common. It will not be too long before we
see this as a feature of P&Ls. See Chapter .
Today s organizations rely on their data assets to make more
effective decisions and to operate more efficiently. ”usinesses
use data to understand their customers, create new products and
services, and improve operational efficiency by cutting costs and
controlling risks. Government agencies, educational institutions,
and not-for-profit organizations also need high quality data to
guide their operational, tactical, and strategic activities. “s
organizations increasingly depend on data, the value of data
assets can be more clearly established.
Many organizations identify themselves as data-driven .
”usinesses aiming to stay competitive must stop making
decisions based on gut feelings or instincts, and instead use
event triggers and apply analytics to gain actionable insight.
”eing data-driven includes the recognition that data must be
managed efficiently and with professional discipline, through a
partnership of business leadership and technical expertise.
Furthermore, the pace of business today means that change is no
longer optional digital disruption is the norm. To react to this,
business must co-create information solutions with technical
data professionals working alongside line-of-business
counterparts. They must plan for how to obtain and manage
data that they know they need to support business strategy.
They must also position themselves to take advantage of
opportunities to leverage data in new ways.

2.4 Data Management Principles


Data management shares characteristics with other forms of
asset management, as seen in Figure . It involves knowing what
data an organization has and what might be accomplished with
it, then determining how best to use data assets to reach
organizational goals.
Like other management processes, it must balance strategic and
operational needs. This balance can best be struck by following a
set of principles that recognize salient features of data
management and guide data management practice.

Data is an asset with unique properties Data is an


asset, but it differs from other assets in important ways
that influence how it is managed. The most obvious of
these properties is that data is not consumed when it is
used, as are financial and physical assets.
The value of data can and should be expressed in
economic terms Calling data an asset implies that it
has value. While there are techniques for measuring
data s qualitative and quantitative value, there are not
yet standards for doing so. Organizations that want to
make better decisions about their data should develop
consistent ways to quantify that value. They should
also measure both the costs of low quality data and the
benefits of high quality data.
Managing data means managing the quality of data
Ensuring that data is fit for purpose is a primary goal
of data management. To manage quality, organizations
must ensure they understand stakeholders
requirements for quality and measure data against
these requirements.
It takes Metadata to manage data Managing any asset
requires having data about that asset number of
employees, accounting codes, etc. . The data used to
manage and use data is called Metadata. ”ecause data
cannot be held or touched, to understand what it is and
how to use it requires definition and knowledge in the
form of Metadata. Metadata originates from a range of
processes related to data creation, processing, and use,
including architecture, modeling, stewardship,
governance, Data Quality management, systems
development, IT and business operations, and
analytics.

Figure 1 Data Management


Principles
It takes planning to manage data Even small
organizations can have complex technical and business
process landscapes. Data is created in many places and
is moved between places for use. To coordinate work
and keep the end results aligned requires planning
from an architectural and process perspective.
Data management is cross-functional it requires a
range of skills and expertise “ single team cannot
manage all of an organization s data. Data
management requires both technical and non-technical
skills and the ability to collaborate.
Data management requires an enterprise perspective
Data management has local applications, but it must be
applied across the enterprise to be as effective as
possible. This is one reason why data management and
data governance are intertwined.
Data management must account for a range of
perspectives Data is fluid. Data management must
constantly evolve to keep up with the ways data is
created and used and the data consumers who use it.
Data management is lifecycle management Data has
a lifecycle and managing data requires managing its
lifecycle. ”ecause data begets more data, the data
lifecycle itself can be very complex. Data management
practices need to account for the data lifecycle.
Different types of data have different lifecycle
characteristics “nd for this reason, they have different
management requirements. Data management
practices have to recognize these differences and be
flexible enough to meet different kinds of data lifecycle
requirements.
Managing data includes managing the risks
associated with data In addition to being an asset,
data also represents risk to an organization. Data can
be lost, stolen, or misused. Organizations must
consider the ethical implications of their uses of data.
Data-related risks must be managed as part of the data
lifecycle.
Data management requirements must drive
Information Technology decisions Data and data
management are deeply intertwined with information
technology and information technology management.
Managing data requires an approach that ensures
technology serves, rather than drives, an organization s
strategic data needs.
Effective data management requires leadership
commitment Data management involves a complex
set of processes that, to be effective, require
coordination, collaboration, and commitment. Getting
there requires not only management skills, but also the
vision and purpose that come from committed
leadership.

2.5 Data Management Challenges


”ecause data management has distinct characteristics derived
from the properties of data itself, it also presents challenges in
following these principles. Details of these challenges are
discussed in Sections . . through . . . Many of these
challenges refer to more than one principle.

2.5.1 Data Differs from Other Assets6


Physical assets can be pointed to, touched, and moved around.
They can be in only one place at a time. Financial assets must be
accounted for on a balance sheet. However, data is different.
Data is not tangible. Yet it is durable it does not wear out,
though the value of data often changes as it ages. Data is easy to
copy and transport. ”ut it is not easy to reproduce if it is lost or
destroyed. ”ecause it is not consumed when used, it can even be
stolen without being gone. Data is dynamic and can be used for
multiple purposes. The same data can even be used by multiple
people at the same time something that is impossible with
physical or financial assets. Many uses of data beget more data.
Most organizations must manage increasing volumes of data
and the relation between data sets.
These differences make it challenging to put a monetary value
on data. Without this monetary value, it is difficult to measure
how data contributes to organizational success. These
differences also raise other issues that affect data management,
such as defining data ownership, inventorying how much data
an organization has, protecting against the misuse of data,
managing risk associated with data redundancy, and defining
and enforcing standards for Data Quality.
Despite the challenges with measuring the value of data, most
people recognize that data, indeed, has value. “n organization s
data is unique to itself. Were organizationally unique data such
as customer lists, product inventories, or claim history to be lost
or destroyed, replacing it would be impossible or extremely
costly. Data is also the means by which an organization knows
itself it is a meta-asset that describes other assets. “s such, it
provides the foundation for organizational insight.
Within and between organizations, data and information are
essential to conducting business. Most operational business
transactions involve the exchange of information. Most
information is exchanged electronically, creating a data trail.
This data trail can serve purposes in addition to marking the
exchanges that have taken place. It can provide information
about how an organization functions.
”ecause of the important role that data plays in any
organization, it needs to be managed with care.

2.5.2 Data Valuation


Value is the difference between the cost of a thing and the benefit
derived from that thing. For some assets, like stock, calculating
value is easy. It is the difference between what the stock cost
when it was purchased and what it was sold for. ”ut for data,
these calculations are more complicated, because neither the
costs nor the benefits of data are standardized.
Since each organization s data is unique to itself, an approach to
data valuation needs to begin by articulating general cost and
benefit categories that can be applied consistently within an
organization. Sample categories include

Cost of obtaining and storing data


Cost of replacing data if it were lost
Impact to the organization if data were missing
Cost of risk mitigation and potential cost of risks
associated with data
Cost of improving data
”enefits of higher quality data
What competitors would pay for data
What the data could be sold for
Expected revenue from innovative uses of data

“ primary challenge to data asset valuation is that the value of


data is contextual what is of value to one organization may not
be of value to another and often temporal what was valuable
yesterday may not be valuable today . That said, within an
organization, certain types of data are likely to be consistently
valuable over time. Take reliable customer information, for
example. Customer information may even grow more valuable
over time, as more data accumulates related to customer
activity.
In relation to data management, establishing ways to associate
financial value with data is critical, since organizations need to
understand assets in financial terms in order to make consistent
decisions. Putting value on data becomes the basis of putting
value on data management activities. The process of data
valuation can also be used a means of change management.
“sking data management professionals and the stakeholders
they support to understand the financial meaning of their work
can help an organization transform its understanding of its own
data and, through that, its approach to data management.

2.5.3 Data Quality


Ensuring that data is of high quality is central to data
management. Organizations manage their data because they
want to use it. If they cannot rely on it to meet business needs,
then the effort to collect, store, secure, and enable access to it is
wasted. To ensure data meets business needs, they must work
with data consumers to define these needs, including
characteristics that make data of high quality.
Largely because data has been associated so closely with
information technology, managing Data Quality has historically
been treated as an afterthought. IT teams are often dismissive of
the data that the systems they create are supposed to store. It
was probably a programmer who first observed garbage in,
garbage out and who no doubt wanted to let it go at that. ”ut
the people who want to use the data cannot afford to be
dismissive of quality. They generally assume data is reliable and
trustworthy, until they have a reason to doubt these things.
Once they lose trust, it is difficult to regain it.
Most uses of data involve learning from it in order to apply that
learning and create value. Examples include understanding
customer habits in order to improve a product or service and
assessing organizational performance or market trends in order
to develop a better business strategy, etc. Poor quality data will
have a negative impact on these decisions.
“s importantly, poor quality data is simply costly to any
organization. Estimates differ, but experts think organizations
spend between - % of revenue handling data quality issues.
I”M estimated the cost of poor quality data in the US in
was $ . Trillion. Many of the costs of poor quality data are
hidden, indirect, and therefore hard to measure. Others, like
fines, are direct and easy to calculate. Costs come from

Scrap and rework


Work-arounds and hidden correction processes
Organizational inefficiencies or low productivity
Organizational conflict
Low job satisfaction
Customer dissatisfaction
Opportunity costs, including inability to innovate
Compliance costs or fines
Reputational costs

The corresponding benefits of high quality data include

Improved customer experience


Higher productivity
Reduced risk
“bility to act on opportunities
Increased revenue
Competitive advantage gained from insights on
customers, products, processes, and opportunities

“s these costs and benefits imply, managing Data Quality is not


a one-time job. Producing high quality data requires planning,
commitment, and a mindset that builds quality into processes
and systems. “ll data management functions can influence Data
Quality, for good or bad, so all of them must account for it as
they execute their work. See Chapter .

2.5.4 Planning for Better Data


“s stated in the chapter introduction, deriving value from data
does not happen by accident. It requires planning in many
forms. It starts with the recognition that organizations can
control how they obtain and create data. If they view data as a
product that they create, they will make better decisions about it
throughout its lifecycle. These decisions require systems
thinking because they involve

The ways data connects business processes that might


otherwise be seen as separate
The relationship between business processes and the
technology that supports them
The design and architecture of systems and the data
they produce and store
The ways data might be used to advance
organizational strategy

Planning for better data requires a strategic approach to


architecture, modeling, and other design functions. It also
depends on strategic collaboration between business and IT
leadership. “nd, of course, it depends on the ability to execute
effectively on individual projects.
The challenge is that there are usually organizational pressures,
as well as the perennial pressures of time and money, that get in
the way of better planning. Organizations must balance long-
and short-term goals as they execute their strategies. Having
clarity about the trade-offs leads to better decisions.
2.5.5 Metadata and Data Management
Organizations require reliable Metadata to manage data as an
asset. Metadata in this sense should be understood
comprehensively. It includes not only the business, technical,
and operational Metadata described in Chapter , but also the
Metadata embedded in Data “rchitecture, data models, data
security requirements, data integration standards, and data
operational processes. See Chapters .
Metadata describes what data an organization has, what it
represents, how it is classified, where it came from, how it
moves within the organization, how it evolves through use, who
can and cannot use it, and whether it is of high quality. Data is
abstract. Definitions and other descriptions of context enable it
to be understood. They make data, the data lifecycle, and the
complex systems that contain data comprehensible.
The challenge is that Metadata is a form of data and needs to be
managed as such. Organizations that do not manage their data
well generally do not manage their Metadata at all. Metadata
management often provides a starting point for improvements
in data management overall.

2.5.6 Data Management is Cross-functional


Data management is a complex process. Data is managed in
different places within an organization by teams that have
responsibility for different phases of the data lifecycle. Data
management requires design skills to plan for systems, highly
technical skills to administer hardware and build software, data
analysis skills to understand issues and problems, analytic skills
to interpret data, language skills to bring consensus to
definitions and models, as well as strategic thinking to see
opportunities to serve customers and meet goals.
The challenge is getting people with this range of skills and
perspectives to recognize how the pieces fit together so that they
collaborate well as they work toward common goals.
2.5.7 Establishing an Enterprise Perspective
Managing data requires understanding the scope and range of
data within an organization. Data is one of the horizontals of
an organization. It moves across verticals, such as sales,
marketing, and operations… Or at least it should. Data is not
only unique to an organization sometimes it is unique to a
department or other sub-part of an organization. ”ecause data is
often viewed simply as a by-product of operational processes
for example, sales transaction records are the by-product of the
selling process , it is not always planned for beyond the
immediate need.
Even within an organization, data can be disparate. Data
originates in multiple places within an organization. Different
departments may have different ways of representing the same
concept e.g., customer, product, vendor . “s anyone involved in
a data integration or Master Data Management project can
testify, subtle or blatant differences in representational choices
present challenges in managing data across an organization. “t
the same time, stakeholders assume that an organization s data
should be coherent, and a goal of managing data is to make it fit
together in common sense ways so that it is usable by a wide
range of data consumers.
One reason data governance has become increasingly important
is to help organizations make decisions about data across
verticals. See Chapter .

2.5.8 Accounting for Other Perspectives


Today s organizations use data that they create internally, as
well as data that they acquire from external sources. They have
to account for different legal and compliance requirements
across national and industry lines. People who create data often
forget that someone else will use that data later. Knowledge of
the potential uses of data enables better planning for the data
lifecycle and, with that, for better quality data. Data can also be
misused. “ccounting for this risk reduces the likelihood of
misuse.

2.5.9 The Data Lifecycle


Like other assets, data has a lifecycle. To effectively manage data
assets, organizations need to understand and plan for the data
lifecycle. Well-managed data is managed strategically, with a
vision of how the organization will use its data. “ strategic
organization will define not only its data content requirements,
but also its data management requirements. These include
policies and expectations for use, quality, controls, and security
an enterprise approach to architecture and design and a
sustainable approach to both infrastructure and software
development.
The data lifecycle is based on the product lifecycle. It should not
be confused with the systems development lifecycle.
Conceptually, the data lifecycle is easy to describe see Figure .
It includes processes that create or obtain data, those that move,
transform, and store it and enable it to be maintained and
shared, and those that use or apply it, as well as those that
dispose of it. Throughout its lifecycle, data may be cleansed,
transformed, merged, enhanced, or aggregated. “s data is used
or enhanced, new data is often created, so the lifecycle has
internal iterations that are not shown on the diagram. Data is
rarely static. Managing data involves a set of interconnected
processes aligned with the data lifecycle.
The specifics of the data lifecycle within a given organization
can be quite complicated, because data not only has a lifecycle, it
also has lineage i.e., a pathway along which it moves from its
point of origin to its point of usage, sometimes called the data
chain . Understanding the data lineage requires documenting the
origin of data sets, as well as their movement and
transformation through systems where they are accessed and
used. Lifecycle and lineage intersect and can be understood in
relation to each other. The better an organization understands
the lifecycle and lineage of its data, the better able it will be to
manage its data.
The focus of data management on the data lifecycle has several
important implications

Creation and usage are the most critical points in the


data lifecycle Data management must be executed
with an understanding of how data is produced, or
obtained, as well as how data is used. It costs money to
produce data. Data is valuable only when it is
consumed or applied. See Chapters , , , , and .

Figure 2 Data Lifecycle Key


Activities

You might also like