MGI The Age of Analytics Full Report PDF
MGI The Age of Analytics Full Report PDF
MGI The Age of Analytics Full Report PDF
COMPETING IN A
DATA-DRIVEN WORLD
DECEMBER 2016
IN COLLABORATION WITH
MCKINSEY ANALYTICS
HIGHLIGHTS
34 55 75
MGI research combines the disciplines of economics and management, employing the
analytical tools of economics with the insights of business leaders. Our micro-to-macro
methodology examines microeconomic industry trends to better understand the broad
macroeconomic forces affecting business strategy and public policy. MGIs in-depth reports
have covered more than 20 countries and 30 industries. Current research focuses on six
themes: productivity and growth, natural resources, labor markets, the evolution of global
financial markets, the economic impact of technology and innovation, and urbanization.
Recent reports have assessed the economic benefits of tackling gender inequality,
a new era of global competition, Chinese innovation, and digital globalization. MGI is
led by four McKinsey & Company senior partners: JacquesBughin, JamesManyika,
JonathanWoetzel, and Frank Mattern, MGIs chairman. MichaelChui, SusanLund,
AnuMadgavkar, and JaanaRemes serve as MGI partners. Project teams are led by
the MGI partners and a group of senior fellows, and include consultants from McKinsey
offices around the world. These teams draw on McKinseys global network of partners
and industry and management experts. Input is provided by the MGI Council, which co-
leads projects and provides guidance; members are AndresCadena, RichardDobbs,
KatyGeorge, RajatGupta, EricHazan, EricLabaye, AchaLeke, ScottNyquist, GaryPinkus,
ShirishSankhe, OliverTonby, and EckartWindhagen. In addition, leading economists,
including Nobel laureates, act as research advisers.
The partners of McKinsey fund MGIs research; it is not commissioned by any business,
government, or other institution. For further information about MGI and to download reports,
please visit www.mckinsey.com/mgi.
MCKINSEY ANALYTICS
McKinsey Analytics helps clients achieve better performance through data, working
together with them to build analytics-driven organizations and providing end-to-end support
covering strategy, operations, data science, implementation, and change management.
Engagements range from use-case specific applications to full-scale analytics
transformations. Teams of McKinsey consultants, data scientists, and engineers work with
clients to identify opportunities, assess available data, define solutions, establish optimal
hosting environments, ingest data, develop cutting-edge algorithms, visualize outputs, and
assess impact while building capabilities to sustain and expand it.
Five years ago, the McKinsey Global Institute (MGI) released Big data: The next frontier for
innovation, competition, and productivity. In the years since, data science has continued to
make rapid advances, particularly on the frontiers of machine learning and deep learning.
Organizations now have troves of raw data combined with powerful and sophisticated
analytics tools to gain insights that can improve operational performance and create new
market opportunities. Most profoundly, their decisions no longer have to be made in the
dark or based on gut instinct; they can be based on evidence, experiments, and more
accurate forecasts.
As we take stock of the progress that has been made over the past five years, we see
that companies are placing big bets on data and analytics. But adapting to an era of
more data-driven decision making has not always proven to be a simple proposition for
people or organizations. Many are struggling to develop talent, business processes, and
organizational muscle to capture real value from analytics. This is becoming a matter of
urgency, since analytics prowess is increasingly the basis of industry competition, and
the leaders are staking out large advantages. Meanwhile, the technology itself is taking
major leaps forwardand the next generation of technologies promises to be even more
disruptive. Machine learning and deep learning capabilities have an enormous variety of
applications that stretch deep into sectors of the economy that have largely stayed on the
sidelines thus far.
This research is a collaboration between MGI and McKinsey Analytics, building on more
than five years of research on data and analytics as well as knowledge developed in work
with clients across industries. This research also draws on a large body of MGI research on
digital technology and its effects on productivity, growth, and competition. It aims to help
organizational leaders understand the potential impact of data and analytics, providing
greater clarity on what the technology can do and the opportunities at stake.
The research was led by NicolausHenke, global leader of McKinsey Analytics, based
in London; JacquesBughin, an MGI director based in Brussels; MichaelChui, an MGI
partner based in San Francisco; JamesManyika, an MGI director based in San Francisco;
TamimSaleh, a senior partner of McKinsey based in London; and BillWiseman, a
senior partner of McKinsey based in Taipei. The project team, led by GuruSethupathy
and AndreyMironenko, included Ville-PekkaBacklund, RachelForman, PeteMulligan,
DelwinOlivan, DennisSchwedhelm, and CoryTurner. LisaRenaud served as senior editor.
Sincere thanks go to our colleagues in operations, production, and external relations,
including TimBeacom, MarisaCarder, MattCooke, DeadraHenderson, RichardJohnson,
JuliePhilpot, LauraProudlock, RebecaRobboy, StaceySchulte, MargoShimasaki, and
PatrickWhite.
We are grateful to the McKinsey Analytics leaders who provided guidance across
the research, including DilipBhattacharjee, AlejandroDiaz, MikaelHagstroem, and
ChrisWigley. In addition, this project benefited immensely from the many McKinsey
colleagues who shared their expertise and insights. Thanks go to AliArat, MattAriker,
StevenAronowitz, BillAull, SvenBeiker, MicheleBertoncello, JamesBiggin-Lamming,
YvesBoussemart, ChadBright, ChiaraBrocchi, BedeBroome, AlexBrotschi, DavidBueno,
EricBuesing, RuneBundgaard, SarahCalkins, BenCheatham, JoyChen, SastryChilukuri,
BrianCrandall, ZakCutler, SethDalton, SeverinDennhardt, AlexanderDiLeonardo,
NicholasDonoghoe, JonathanDunn, LeelandEkstrom, MehdiElOuali, PhilippEspel,
MatthiasEvers, RobertFeldmann, DavidFrankel, LukeGerdes, GregGilbert,
TarasGorishnyy, JoshGottlieb, DavideGrande, DainaGraybosch, FerryGrijpink,
WolfgangGnthner, VineetGupta, MarkusHammer, LudwigHausmann, AndrasHavas,
MalteHippe, MinhaHwang, AlainImbert, MirjanaJozic, HusseinKalaoui, MatthiasKsser,
JoshuaKatz, SunilKishore, BjornKortner, AdiKumar, TomLatkovic, DanielLubli,
JordanLevine, NimalManuel, J.R.Maxwell, TimMcGuire, DougMcElhaney,
FareedMelhem, PhillipeMenu, BrianMilch, ChannieMize, TimoMller, StefanNagel,
DeepaliNarula, DerekNeilson, FlorianNeuhaus, DimitriObolenski, IvanOstojic,
MiklosRadnai, SantiagoRestrepo, FarhadRiahi, StefanRickert, EmirRoach,
MatthiasRoggendorf, MarcusRoth, TomRuby, AlexandruRus, PashaSarraf,
WhitneySchumacher, JeongminSeong, ShaSha, AbdulWahabShaikh, TatianaSivaeva,
MichaelSteinmann, KunalTanwar, MikeThompson, RobTurtle, JonathanUsuka,
VijayVaidya, SriVelamoor, RichardWard, KhilonyWestphely, DanWilliams, SimonWilliams,
EckartWindhagen, MartinWrulich, ZivYaar, and GordonYu.
Our academic adviser was MartinBaily, Senior Fellow and BernardL.Schwartz Chair in
Economic Policy Development at the Brookings Institution, who challenged our thinking and
provided valuable feedback and guidance. We also thank SteveLangdon and the Google
TensorFlow group for their helpful feedback on machine learning.
This report contributes to MGIs mission to help business and policy leaders understand
the forces transforming the global economy and prepare for the next wave of growth.
As with all MGI research, this work is independent, reflects our own views, and has not
been commissioned by any business, government, or other institution. We welcome your
comments on the research at MGI@mckinsey.com.
Jacques Bughin
Director, McKinsey Global Institute
Senior Partner, McKinsey & Company
Brussels
James Manyika
Director, McKinsey Global Institute
Senior Partner, McKinsey & Company
San Francisco
Jonathan Woetzel
Director, McKinsey Global Institute
Senior Partner, McKinsey & Company
Shanghai
December 2016
Chombosan/Shutterstock
CONTENTS
HIGHLIGHTS In Brief
Page vi
38
Executive summary
Page 1
The demand for talent 1. The data and analytics revolution gains momentum
Page 21
66
2. Opportunities stilluncaptured
Page 29
87
4. Models of disruption fueled by data and analytics
Page 55
Technical appendix
Page 95
Bibliography
Page 121
IN BRIEF
THE AGE OF ANALYTICS:
COMPETING IN A DATA-DRIVEN WORLD
Data and analytics capabilities have made a leap forward in recent years. The volume of available data has
grown exponentially, more sophisticated algorithms have been developed, and computational power and
storage have steadily improved. The convergence of these trends is fueling rapid technology advances
and business disruptions.
Most companies are capturing only a fraction of the potential value from data and analytics. Our 2011
report estimated this potential in five domains; revisiting them today shows a great deal of value still on
the table. The greatest progress has occurred in location-based services and in retail, both areas with
digital native competitors. In contrast, manufacturing, the public sector, and health care have captured
less than 30percent of the potential value we highlighted five years ago. Further, new opportunities
have arisen since 2011, making the gap between the leaders and laggards even bigger.
The biggest barriers companies face in extracting value from data and analytics are organizational;
many struggle to incorporate data-driven insights into day-to-day business processes. Another
challenge is attracting and retaining the right talentnot only data scientists but business translators
who combine data savvy with industry and functional expertise.
Data and analytics are changing the basis of competition. Leading companies are using their
capabilities not only to improve their core operations but to launch entirely new business models. The
network effects of digital platforms are creating a winner-take-most dynamic in some markets.
Data is now a critical corporate asset. It comes from the web, billions of phones, sensors, payment
systems, cameras, and a huge array of other sourcesand its value is tied to its ultimate use. While
data itself will become increasingly commoditized, value is likely to accrue to the owners of scarce
data, to players that aggregate data in unique ways, and especially to providers of valuable analytics.
Data and analytics underpin several disruptive models. Introducing new types of data sets
(orthogonal data) can disrupt industries, and massive data integration capabilities can break through
organizational and technological silos, enabling new insights and models. Hyperscale digital platforms
can match buyers and sellers in real time, transforming inefficient markets. Granular data can be used
to personalize products and servicesand, most intriguingly, health care. New analytical techniques
can fuel discovery and innovation. Above all, data and analytics can enable faster and more evidence-
based decision making.
Recent advances in machine learning can be used to solve a tremendous variety of problemsand
deep learning is pushing the boundaries even further. Systems enabled by machine learning can
provide customer service, manage logistics, analyze medical records, or even write news stories. The
value potential is everywhere, even in industries that have been slow to digitize. These technologies
could generate productivity gains and an improved quality of lifealong with job losses and other
disruptions. Previous MGI research found that 45percent of work activities could potentially be
automated by currently demonstrated technologies; machine learning can be an enabling technology
for the automation of 80percent of those activities. Breakthroughs in natural language processing
could expand that impact even further.
Data and analytics are already shaking up multiple industries, and the effects will only become more
pronounced as adoption reaches critical mass. An even bigger wave of change is looming on the horizon
as deep learning reaches maturity, giving machines unprecedented capabilities to think, problem-solve,
and understand language. Organizations that are able to harness these capabilities effectively will be able
to create significant value and differentiate themselves, while others will find themselves increasingly at
a disadvantage.
The age of analytics:
Competing in a data-driven world
Only a fraction of the value we envisioned in 2011 has been captured to date
European Union United States Manufacturing United States Location-based
public sector health care retail data
Data and analytics fuel 6 disruptive models that As data ecosystems evolve, value
change the nature of competition will accrue to providers of analytics,
but some data generators and
Data-driven aggregators will have unique value
discovery and Value share
innovation
Generate Aggregate Analyze
Massive
data Radical
integration personalization
Value
Data
Volume of data and use cases per player
Generate Aggregate Analyze
Hyperscale, Orthogonal
real-time data sets
matching
Volume
Enhanced
decision
making
Back in 2011, the McKinsey Global Institute published a report highlighting the
transformational potential of big data.1 Five years later, we remain convinced that this
potential has not been overhyped. In fact, we now believe that our 2011 analyses gave only a
partial view. The range of applications and opportunities has grown even larger today.
The companies at the forefront of these trends are using their capabilities to tackle business
problems with a whole new mindset. In some cases, they have introduced data-driven
business models that have taken entire industries by surprise. Digital natives have an
enormous advantage, and to keep up with them, incumbents need to apply data and
analytics to the fundamentals of their existing business while simultaneously shifting the
basis of competition. In an environment of increasing volatility, legacy organizations need
to have one eye on high-risk, high-reward moves of their own, whether that means entering
new markets or changing their business models. At the same time, they have to apply
analytics to improve their core operations. This may involve identifying new opportunities
on the revenue side, using analytics insights to streamline internal processes, and building
mechanisms for experimentation to enable continuous learning and feedback.
Organizations that pursue this two-part strategy will be ready to take advantage of
opportunities and thwart potential disruptorsand they have to assume that those
disruptors are right around the corner. Data and analytics have altered the dynamics in many
industries, and change will only accelerate as machine learning and deep learning develop
capabilities to think, problem-solve, and understand language. The potential uses of these
technologies are remarkably broad, even for sectors that have been slow to digitize. As we
enter a world of self-driving cars, personalized medicine, and intelligent robots, there will be
enormous new opportunities as well as significant risksnot only for individual companies
but for society as a whole.
1
Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, June 2011.
Exhibit E1
There has been uneven progress in capturing value from data and analytics
Value captured
Potential impact: 2011 research % Major barriers
Manufacturing2 Up to 50% lower product development cost Siloed data in legacy IT systems
Up to 25% lower operating cost 20 Leadership skeptical of impact
Up to 30% gross margin increase 30
US health care $300 billion value per year Need to demonstrate clinical
~0.7% annual productivity growth 10 utility to gain acceptance
20 Interoperability and data sharing
US retail: Retailers can mine a trove of transaction-based and behavioral data from
their customers. Thin margins (especially in the grocery sector) and pressure from
industry-leading early adopters such as Amazon and Walmart have created strong
incentives to put that data to work in everything from cross-selling additional products to
reducing costs throughout the entire value chain. The US retail sector has realized 30 to
40percent of the potential margin improvements and productivity growth we envisioned
Future of decision making (big data)
in 2011, but again, a great deal of value has gone to consumers.
ES Manufacturing: Manufacturing industries have achieved only about 20 to 30percent
mc 1205 of the potential value we estimated in 2011and most has gone to a handful of industry
leaders. Within research and design, design-to-value applications have seen the
greatest uptick in adoption, particularly among carmakers. Some industry leaders have
developed digital models of the entire production process (digital factories). More
companies have integrated sensor data-driven operations analytics, often reducing
10-20%
The EU public sector: Our 2011 report analyzed how the European Unions public
sector could use data and analytics to make government services more efficient,
reduce fraud and errors in transfer payments, and improve tax collection, potentially
of the potential
achieving some 250billion worth of annual savings. But only about 10 to 20percent of
value has been
this has materialized. Some agencies have moved more interactions online, and many
captured in the
(particularly tax agencies) have introduced pre-filled forms. But across Europe and other
public sector and advanced economies, adoption and capabilities vary greatly. The complexity of existing
health care systems and the difficulty of attracting scarce analytics talent with public-sector salaries
have slowed progress. Despite this, we see even wider potential today for societies to
use analytics to make more evidence-based decisions in many aspects of government.
Digital America: A tale of the haves and have-mores, McKinsey Global Institute, December 2015; and Digital
2
Europe: Pushing the frontier, capturing the benefits, McKinsey Global Institute, June 2016.
Exhibit E2
Data modeling
Internal black box Process redesign Capability building
USE
CASES/ DATA MODELING WORKFLOW
ADOPTION
SOURCES ECOSYSTEM INSIGHTS INTEGRATION
OF VALUE
External Heuristic insights Tech enablement Change management
smart box
Clearly articulating Gathering data from Applying linear and Redesigning Building frontline
the business need internal systems and nonlinear modeling processes and management
and projected impact external sources to derive new Developing an capabilities
Outlining a clear Appending key insights intuitive user Proactively
vision of how the external data Codifying and testing interface that is managing change
business would use Creating an analytic heuristics integrated into day- and tracking
the solution sandbox across the to-day workflow adoption
Enhancing data organization Automating with performance
(deriving new (informing predictor workflows indicators
predictor variables) variables)
Data scientists, in particular, are in high demand. Our 2011 report hypothesized that
demand for data scientists would outstrip supply. This is in fact what we see in the labor
market today, despite the fact that universities are adding data and analytics programs and
that other types of training programs are proliferating. Average wages for data scientists
in the United States rose by approximately 16percent a year from 2012 to 2014.4 This far
3
The need to lead in data and analytics, McKinsey & Company survey, McKinsey.com, April 2016, available at
http://www.mckinsey.com/business-functions/business-technology/our-insights/the-need-to-lead-in-data-
and-analytics. The online survey, conducted in September 2015, garnered responses from more than 500
executives across a variety of regions, industries, and company sizes.
4
Beyond the talent shortage: How tech candidates search for jobs, Indeed.com, September 2015.
This trend is likely to continue in the near term. While we estimate that the number of
graduates from data science programs could increase by a robust 7percent per year, our
high-case scenario projects even greater (12percent) annual growth in demand, which
would lead to a shortfall of some 250,000 data scientists. But a countervailing force could
ease this imbalance in the medium term: data preparation, which accounts for more than
50percent of data science work, could be automated. Whether that dampens the demand
for data scientists or simply enables data scientists to shift their work toward analysis and
other activities remains to be seen.
Many organizations focus on the need for data scientists, assuming their presence alone
will enable an analytics transformation. But another equally vital role is that of the business
translator who serves as the link between analytical talent and practical applications to
business questions. In addition to being data savvy, business translators need to have deep
organizational knowledge and industry or functional expertise. This enables them to ask
the data science team the right questions and to derive the right insights from their findings.
It may be possible to outsource analytics activities, but business translator roles require
proprietary knowledge and should be more deeply embedded into the organization. Many
organizations are building these capabilities from within.
2M-4M
We estimate there could be demand for approximately twomillion to fourmillion business
translators in the United States alone over the next decade. Given the roughly 9.5million
US graduates in business and in the STEM fields of science, technology, engineering, and
projected US
mathematics expected over the same period, nearly 20 to 40percent of these graduates
demand for
would need to go into business translator roles to meet demand.5 Today that figure is
business
only about 10percent. To reduce this mismatch, wages may have to increase, or more
translators over companies will need to implement their own training programs.6
the next decade
As data grows more complex, distilling it and bringing it to life through visualization is
becoming critical to help make the results of data analyses digestible for decision makers.
We estimate that demand for visualization grew roughly 50percent annually from 2010
to 2015.7 In many instances today, organizations are seeking data scientist or business
translator candidates who can also execute visualizations. However, we expect that
medium-size and large organizations, as well as analytics service providers, will increasingly
create specialized positions for candidates who combine a strong understanding of data
with user interface, user experience, and graphic design skills.
5
Non-STEM graduates with quantitative skills can also fill business translator roles.
6
Sam Ransbotham, David Kiron, and Pamela Kirk Prentice, The talent dividend: Analytics talent is driving
competitive advantage at data-oriented companies, MIT Sloan Management Review, April 25, 2015.
7
Based on using the Burning Glass job postings database to search for postings including any of the following
skills: data visualization, Tableau, Qlikview, and Spotfire. Normalized with the total number of job postings.
8
Michael Chui and James Manyika, Competition at the digital edge: Hyperscale businesses, McKinsey
Quarterly, March 2015.
The relative value of various assets has shifted. Where previous titans of industry poured
billions into factories and equipment, the new leaders invest heavily in digital platforms, data,
and analytical talent. New digital native players can circumvent traditional barriers to entry,
such as the need to build traditional fixed assets, which enables them to enter markets with
surprising speed. Amazon challenged the rest of the retail sector without building stores
(though it does have a highly digitized physical distribution network), fintechs are providing
financial services without physical bank branches, Netflix is changing the media landscape
without connecting cables to customers homes, and Airbnb has introduced a radical new
model in the hospitality sector without building hotels. But some digital natives are now
erecting new barriers to entry themselves; platforms may have such strong network effects
that they give operators a formidable advantage within a given market.
The leading firms have a remarkable depth of analytical talent deployed on a variety of
problemsand they are actively looking for ways to enter other industries. These companies
can take advantage of their scale and data insights to add new business lines, and those
expansions are increasingly blurring traditional sector boundaries.9 Apple and Alibaba,
for instance, have introduced financial products and services, while Google is developing
autonomous cars. The importance of data has also upended the traditional relationship
between organizations and their customers since every interaction generates information.
Sometimes the data itself is so prized that companies offer free services in order to obtain
it; this is the case with Facebook, LinkedIn, Pinterest, Twitter, Tencent, and many others. An
underlying barter system is at work, particularly in the consumer space, as individuals gain
access to digital services in return for data about their behaviors and transactions.
9
Playing to win: The new global competition for corporate profits, McKinsey Global Institute, September 2015.
Data generation and collection: The source and platform where data are
initially captured.
Data aggregation: Processes and platforms for combining data from multiple sources.
Data analysis: The gleaning of insights from data that can be acted upon.
Usually, the biggest opportunities are unlikely to be in directly monetizing data. As data
become easier to collect and as storage costs go down, most data are becoming more
commoditized. Proxies now exist for data that were once scarce; Google Trends, for
instance, offers a free proxy for public sentiment data that previously would have been
collected through phone surveys.
However, there are important exceptions to the commoditization trend. When access is
limited by physical barriers or collection is expensive, data will hold its value. An important
case in which value can accrue to data generation and collection involves market-making
or social media platforms with strong network effects. In certain arenas, a small number
of players establish such critical mass that they are in a position to collect and own the
vast majority of user behavior data generated in these ecosystems. But in the absence of
these types of exceptional supply constraints, simply selling raw data is likely to generate
diminishing returns over time.
Another role in the data ecosystem involves aggregating information from different sources.
In general, this capability is becoming more accessible and less expensive, but this role can
be valuable when certain conditions apply. Data aggregation adds value when combining,
processing, and aggregating data is technically difficult or organizationally challenging
(for example, when aggregating involves coordinating access across diverse sources).
Some companies have built business models around serving as third-party aggregators
for competitors within a given industry, and this model has the potential to create network
effects as well.
The third part of the data ecosystem, analytics, is where we expect to see the biggest
opportunities in the future. The provider of analytics understands the value being generated
by those insights and is thus best positioned to capture a portion of that value. Data
analytics tools, like other software, already command large margins. Combining analytical
tools with business insights for decision makers is likely to multiply the value even further.
Increasingly complex data and analytics will require sophisticated translation, and use
cases will be very firm-specific. Bad analysis can destroy the potential value of high-quality
data, while great analysis can squeeze insights from even mediocre data. In addition, the
scarcity of analytics talent is driving up the cost of these services. Given the size of the
opportunities, firms in other parts of the ecosystem are scrambling to stake out a niche in
the analytics market. Data aggregators are offering to integrate clients data and perform
analysis as a service. One-stop shops offering integrated technology stacks are adding
analytics capabilities, such as IBM Watson, as are other professional services and business
intelligence firms.
Exhibit E3
Data and analytics underpin six disruptive models, and certain characteristics make individual domains susceptible
REPEATS in report
kind of standardized data to make decisions, bringing in fresh types of data sets to
supplement those already in use can change the basis of competition. New entrants with
privileged access to these orthogonal data sets can pose a uniquely powerful challenge
to incumbents. We see this playing out in property and casualty insurance, where new
companies have entered the marketplace with telematics data that provides insight into
driving behavior. This is orthogonal to the demographic data that had previously been used
for underwriting. Other domains could be fertile ground for bringing in orthogonal data from
the internet of things (IoT). Connected light fixtures, which sense the presence of people
in a room and have been sold with the promise of reducing energy usage, generate data
exhaust that property managers can use to optimize physical space planning. Even in
human resources, some organizations have secured employee buy-in to wear devices that
capture data and yield insights into the real social networks that exist in the workplace,
enabling these organizations to optimize collaboration through changes in work spaces.
Orthogonal data will rarely replace the data that are already in use in a domain; it is more
likely that an organization will integrate orthogonal data with existing data. Within the other
Up to By 2030 mobility services, such as ride sharing and car sharing, could account for more
$2.5T
than 15 to 20percent of total passenger vehicle miles globally. This growthand the
resulting hit to the taxi industrymay be only a hint of what is to come. Automakers are
the biggest question mark. While sales will likely continue to grow in absolute numbers, we
potential economic
estimate that the shift toward mobility services could halve the growth rate of global vehicle
impact from sales by 2030. Consumers could save on car purchases, fuel, and parking. If mobility
continued adoption services attain 10 to 30percent adoption among low-mileage urban vehicle users, the
of mobility services ensuing economic impact could reach $845billion to some $2.5trillion globally by 2025.
by 2025 Some of this value will surely go to consumer surplus, while some will go to the providers of
these platforms and mobility services.
This capability could have profound implications for the way health care is delivered if the
sector can incorporate the behavioral, genetic, and molecular data connected with many
individual patients. The declining costs of genome sequencing, the advent of proteomics,
and the growth of real-time monitoring technologies make it possible to generate this kind
of new, ultra-granular data. These data can reshape health care in two profound ways.
First, they can help address information asymmetries and incentive problems in the health-
care system. Now that a more complete view of the patient is available, incentives could
be changed for hospitals and other providers to shift their focus from disease treatment
to wellness and prevention, saving huge sums on medical expenditures and improving
the quality of life. Second, having more granular and complete data on individual patients
can make treatments more precise. Pharmaceutical and medical device companies
have enormous possibilities in R&D for accelerating drug discovery, although they will be
challenged to create new business models to deliver treatments tailored to smaller, targeted
patient populations. Treatments, dosages, and care settings can be personalized to
individuals, leading to more effective outcomes with fewer side effects and reduced costs.
Personalized medicine could reduce health-care costs while allowing people to enjoy
longer, healthier, and more productive lives. The total impact could range from $2trillion
Up to Retail banking, for instance, is an industry rich with data on customers transactions,
$260B
financial status, and demographics. But few institutions have made the most of the data
due to internal barriers and the variable quality of the information itself. Surmounting
these barriers is critical now that social media, call center discussions, video footage from
potential global
branches, and data acquired from external sources and partners can be used to form a
impact of massive more complete picture of customers. Massive data integration has significant potential for
data integration in retail banks. It can enable better cross-selling, the development of personalized products,
retail banking dynamic pricing, better risk assessment, and more effective marketingand it can help
firms achieve more competitive cost structures than many incumbent institutions. All
told, we estimate a potential economic impact of $110billion to $170billion in the retail
banking industry in developed markets and approximately $60billion to $90billion in
emerging markets.
Additionally, companies in other sectors can become part of the financial services
ecosystem if they bring in orthogonal datasuch as non-financial data that provides a more
comprehensive and detailed view of the customer. These players may have large customer
bases and advanced analytics capabilities created for their core businesses, and they can
use these advantages to make rapid moves across sector boundaries. Alibabas creation of
Alipay and Apples unveiling of Apple Pay are prime examples of this trend.
In the realm of process innovation, data and analytics are helping organizations determine
how to structure teams, resources, and workflows. High-performing teams can be many
times more productive than low-performing teams, so understanding this variance and how
to build more effective collaboration is a huge opportunity for organizations. This involves
looking at issues such as the complementarity of skills, optimal team sizes, whether teams
need to work together in person, what past experience or training is important, and even
how their personalities may mesh. Data and analytics can test hypotheses and find new
patterns that may not have even occurred to managers. Vast amounts of email, calendar,
locational, and other data are available to understand how people work together and
communicate, all of which can lead to new insights about improving performance.
In product innovation, data and analytics can transform research and development in areas
such as materials science, synthetic biology, and life sciences. Leading pharmaceutical
companies are using data and analytics to aid with drug discovery. Data from a variety of
sources could better determine the chemical compounds that would serves as effective
drug treatments for a variety of diseases. AstraZeneca and Human Longevity are partnering
There are many examples of how this can play out in industries and domains across the
economy. Smart cities, for example, are one of the most promising settings for applying
the ability of machines and algorithms to process huge quantities of information in a
fraction of the time it takes humans. Using sensors to improve traffic flows and the internet
of things to enable utilities to reduce waste and keep infrastructure systems working at
top efficiency are just two of the myriad possible municipal applications. One of the most
promising applications of data and analytics is in the prevention of medical errors. Advanced
analytical support tools can flag potential allergies or dangerous drug interactions for
doctors and pharmacists alike, ensuring that their decisions are consistent and reliable.
And finally, perhaps no area of human decision making is quite as opaque and clouded by
asymmetric information as hiring. Data and analytics have the potential to create a more
transparent labor market by giving employers and job seekers access to data on the supply
and demand for particular skills, the wages associated with various jobs, and the value of
different degree programs.
Some machine learning techniques, such as regressions, support vector machines, and
k-means clustering, have been in use for decades. Others, while developed previously,
have become viable only now that vast quantities of data and unprecedented processing
power are available. Deep learning, a frontier area of research within machine learning,
uses neural networks with many layers (hence the label deep) to push the boundaries of
machine capabilities. Data scientists have recently made breakthroughs using deep learning
to recognize objects and faces and to understand and generate language. Reinforcement
learning is used to identify the best actions to take now in order to reach some future goal.
These type of problems are common in games but can be useful for solving dynamic
optimization and control theory problemsexactly the type of issues that come up
in modeling complex systems in fields such as engineering and economics. Transfer
learning focuses on storing knowledge gained while solving one problem and applying it
to a different problem. Machine learning, combined with other techniques, could have an
enormous range of uses (see ExhibitE4 and Box E1, The impact of machine learning).
Exhibit E4
Machine learning can be combined with other types of analytics to solve a large swath of business problems
Clustering Regression
Resource allocation
(e.g., k-means) (e.g., logistic)
Classification
Sorting Predictive maintenance
(e.g., support vector machines)
Classification
Prediction
Generation
Machine learning can help solve classification, prediction, and generation problems
Classification Classify/label visual objects Identify objects, faces in images and video
Classify/label writing and text Identify letters, symbols, words in writing sample
Cluster, group other data Segment objects (e.g., customers, product features) into
categories, clusters
Discover associations Identify that people who watch certain TV shows also read certain
books
Prediction Predict probability of outcomes Predict the probability that a customer will choose another provider
Generation Generate visual objects Trained on a set of artists paintings, generate a new painting in the
same style
Generate writing and text Trained on a historical text, fill in missing parts of a single page
Generate other data Trained on certain countries weather data, fill in missing data
points for countries with low data quality
REPEATS in report
We plotted the top 120 use cases in ExhibitE6. The y-axis shows the volume of available
data (encompassing its breadth and frequency), while the x-axis shows the potential impact,
based on surveys of more than 600 industry experts. The size of the bubble reflects the
diversity of the available data sources.
Exhibit E6
Machine learning has broad potential across industries and use cases
1.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
Impact score
The use cases in the top right quadrant fall into four main categories. First is the radical
personalization of products and services for customers in sectors such as consumer
packaged goods, finance and insurance, health care, and mediaan opportunity that
most companies have yet to fully exploit. The second is predictive analytics. This includes
examples such as triaging customer service calls; segmenting customers based on
risk, churn, and purchasing patterns; identifying fraud and anomalies in banking and
cybersecurity; and diagnosing diseases from scans, biopsies, and other data. The third
category is strategic optimization, which includes uses such as merchandising and shelf
optimization in retail, scheduling and assigning frontline workers, and optimizing teams
and other resources across geographies and accounts. The fourth category is optimizing
operations and logistics in real time, which includes automating plants and machinery to
reduce errors and improve efficiency, and optimizing supply chain management.
$3T
MGIs previous research on automation found that 45percent of all work activities,
associated with $14.6trillion of wages globally, have the potential to be automated
by adapting currently demonstrated technology. Some 80percent of that could be
wages potentially
implemented by using existing machine learning capabilities. But deep learning is in its early
affected if machine
stages. Improvements in its capabilities, particularly in natural language understanding,
learning gains
suggest the potential for an even greater degree of automation. In 16percent of work
better capabilities activities that require the use of language, for example, increasing the performance of
in natural language machine learning in natural language understanding is the only barrier to automation.
understanding Improving natural language capabilities alone could lead to an additional $3trillion in
potential global wage impact.
These detailed work activities are defined by O*NET, a data collection program sponsored by the US
10
Department of Labor. See Michael Chui, James Manyika, and Mehdi Miremadi, Four fundamentals of
workplace automation, McKinsey Quarterly, November 2015.
Improvements in natural learning understanding and generation as well as social sensing would have the biggest
impact on expanding the number of work activities that deep learning could technically automate
Natural language
76 16 53
understanding
Sensory perception 59 1 5
Generating novel
20 4 25
patterns/categories
Social and
25 41
emotional sensing
Recognizing known
99
patterns/categories
Optimization
33
and planning
Natural language
79 2
generation
While machine learning in general and deep learning in particular have exciting and wide-
ranging potential, there are real concerns associated with their development and potential
deployment. Some of these, such as privacy, data security, and data ownership, were
present even before the big data age. But today new questions have formed.
Improvements in deep learning (DL) could affect billions of dollars in wages in ten occupations globally
Global
Most frequently Global wages that
% of time spent on activities performed group of employ- Hourly DL could
that could be automated if DL DWAs that could be ment wage automate
Occupations improves (by DWA group)1 automated if DL improves Million $ $ billion
Secretaries and
Interacting with computers
administrative 28 to enter data, process 48.2 3.90 109.8
assistants, except legal,
information, etc.
medical, and executive
Monitoring processes,
Managers, all other 27 8.3 18.25 86.7
materials, or surroundings
First-line supervisors of
Interpreting the meaning of
office and administrative 35 12.8 8.75 81.5
information for others
support workers
Performing administrative
Cashiers 18 68.1 3.18 81.5
activities
First-line supervisors of
Organizing, planning, and
helpers, laborers, and 24 8.5 12.73 54.2
prioritizing work
material movers
SOURCE: National labor and statistical sources; McKinsey Global Institute analysis
Second, there are ethical questions surrounding machine intelligence. One set of ethical
concerns relates to real-world biases that might be embedded into training data. Another
question involves deciding whose ethical guidelines will be encoded in the decision making
of intelligence and who is responsible for the algorithms conclusions. Leading artificial
intelligence experts, through OpenAI, the Foundation for Responsible Robotics, and other
efforts, have begun tackling these questions.
Third, the potential risks of labor disruption from the use of deep learning to automate
activities are generating anxiety. There is historical precedent for major shifts among sectors
and changes in the nature of jobs in previous waves of automation. In the United States,
the share of farm employment fell from 40percent in 1900 to 2percent in 2000; similarly,
the share of manufacturing employment fell from roughly 25percent in 1950 to less than
10percent in 2010. In both circumstances, while some jobs disappeared, new ones were
created, although what those new jobs would be could not be ascertained at the time. But
history does not necessarily provide assurance that sufficient numbers of new, quality jobs
will be created at the right pace. At the same time, many countries have or will soon have
labor forces that are declining in size, requiring an acceleration of productivity to maintain
anticipated rates of economic growth. But automation technologies will not be widely
adopted overnight; in fact, a forthcoming MGI research report will explore the potential
pace of automation of different activities in different economies. Certainly dealing with job
displacement, retraining, and unemployment will require a complex interplay of government,
private sector, and educational and training institutions, and it will be a significant debate
and an ongoing challenge across society.
Data and analytics have even greater potential to create value today than they did when
companies first began using them. Organizations that are able to harness these capabilities
effectively will be able to create significant value and differentiate themselves, while others
will find themselves increasingly at a disadvantage.
The pace of change is accelerating. The volume of data continues to double every three
years as information pours in from digital platforms, wireless sensors, virtual reality
applications, and billions of mobile phones. Data storage capacity has increased,
while its cost has plummeted. Data scientists now have unprecedented computing
power at their disposal, and they are devising ever more sophisticated algorithms. The
convergence of these trends is setting off industry disruptionsand posing new challenges
for organizations.
Many companies have already harnessed these capabilities to improve their core operations
or to launch entirely new business models. UPS feeds data into its ORION platform to
determine the most efficient routes for its drivers dynamically. In the United States alone,
the company estimates that the system will reduce the number of miles its vehicles travel
each year by 100million, saving more than $300million annually.12 Google is running a
vast number of experiments to induce faster search queries, since a few milliseconds can
translate into additionalmillions of dollars in revenue.13 By merging information gleaned
from social media with its own transaction data from customer relationship management
and billing systems, T-Mobile US is reported to have cut customer defections in half in a
single quarter.14 Netflix has refined its recommendation engine and rolled it out to global
customersand a study released by the company estimates that it brings in $1billion in
annual revenue.15
But most companies remain in the starting gate. Some have invested in data and analytics
technology but have yet to realize the payoff, while others are still wrestling with how to take
the initial steps.
Digital native companies, on the other hand, were built for data and analyticsbased
disruption from their inception. It is easier to design new IT systems and business processes
from scratch than to modify or overhaul legacy systems, and the top analytics talent tends
to flock to organizations that speak their language. All of these advantages underscore the
fact that incumbents need to stay vigilant about competitive threats and take a big-picture
view of which parts of their business model are most vulnerable. In sectors as varied as
transportation, hospitality, and retail, data and analytics assets have helped digital natives
circumvent traditional barriers to entry (such as physical capital investments) and erect new
ones (digital platforms with powerful network effects).
11
Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, June 2011.
12
UPS ORION to be deployed to 70percent of US routes in 2015; delivers significant sustainability benefits,
company press release, March 2, 2015; Steven Rosenbush and Laura Stevens, At UPS, the algorithm is the
driver, The Wall Street Journal, February 16, 2015.
13
Jacques Bughin, Big data, big bang? Journal of Big Data, volume 3, number 2, 2016.
14
Paul Taylor, Crunch time for big data, The Financial Times, June 19, 2012.
15
Carlos A. Gomez-Uribe and Neil Hunt, The Netflix recommender system: Algorithms, business value, and
innovation, ACM Transactions on Management Information Systems, volume 6, number 4, January 2016.
We have reached a unique moment in time. Technology has begun to set off waves of
accelerating change that will become the norm. Innovations in machine learning and deep
learning have expanded the possibilities associated with big data well beyond what we
foresaw in 2011.
Each of these trendsmore data, more tools for analyzing the data, and the firepower
needed to do soconstitutes a breakthrough in its own right. Together these developments
reinforce each other, and the broader field of big data and advanced analytics is making
rapid advances as they converge. Vast increases in data require greater computational
power and infrastructure to analyze and access them. Both data and computational
power enable next-generation machine learning methods such as deep learning, which
employs a deep graph of multiple processing layers that needs to be fed large quantities
of data to produce meaningful insights. The confluence of data, storage, algorithms, and
computational power today has set the stage for a wave of creative destruction.
2X
As our previous report noted, just three exabytes of data existed in 1986but by 2011,
that figure was up to more than 300 exabytes. The trend has not only continued but has
accelerated since then. One analysis estimates that the United States alone has more than
growth in the
two zettabytes (2,000 exabytes) of data, and that volume is projected to double every three
volume of data
years.16
expected every
3years Billions of people worldwide have gradually shifted more of their lives online, using their
smartphones as ever-present personal command centers. A recent US survey found that
42percent of Americans use the internet several times a day, and 21percent report being
online almost constantly.17 All of this activity leaves a rich data trail at every turnin fact,
this is one of the developments that gave rise to the term big data. Internet companies and
websites capture online behavior, including every web search, where people click, how long
they stay on each site, and every transaction they make.
Internet users have gone from being passive consumers to active creators of content
through social media and other formsand digital native companies have capitalized by
capturing these online data. More traditional businesses that rely on physical assets have
John Gantz and David Reinsel, The digital universe in 2020, IDC, February 2013.
16
Andrew Perrin, One-fifth of Americans report going online almost constantly, Pew Research Center,
17
22 McKinsey Global Institute 1. The data and analytics revolution gains momentum
been slower to realize gains from data and analytics, but as they continue to digitize their
customer-facing and internal operations, they, too, now have the building blocks for more
sophisticated use of data and analytics.
Data has not only increased in volume; it has also gained tremendous richness and diversity.
We have entered a new era in which the physical world has become increasingly connected
to the digital world. Data is generated by everything from cameras and traffic sensors to
heart rate monitors, enabling richer insights into human behavior. Retailers, for instance, are
trying to build complete customer profiles across all their touch points, while health-care
providers can now monitor patients remotely. In addition, companies in traditional industries
such as natural resources and manufacturing are using sensors to monitor their physical
assets and can use those data to predict and prevent bottlenecks or to maximize utilization.
Much of this newly available data is in the form of clicks, images, text, or signals of various
sorts, which is very different than the structured data that can be cleanly placed in rows
and columns. Additional storage technologies, including non-relational databases, have
allowed businesses to collect this rich content. A great deal of the potential it holds is still
unrealized.18 But new tools and applications of data and analytics could eventually transform
even traditional industries that once seemed far removed from the digital world.
A standard software program is hard-coded with strict rules for the tasks it needs to
execute. But it cannot adapt to new variables or requirements unless a programmer
updates it with specific new rules. While this works well in some contexts, it is easy to see
why this approach is not scalable to handle all the complexities of the real world. Machine
learning, meanwhile, uses an inductive approach to form a representation of the world
based on the data it sees. It is able to tweak and improve its representation as new data
arrive. In that sense, the algorithm learns from new data inputs and gets better over time.
The key requirement for machine learning is vast quantities of data, which are necessary to
train algorithms. Vastly larger quantities of rich data have enabled remarkable improvements
in machine learning algorithms, including deep learning.
Among the most important advances in machine learning techniques over the past few
years are the following:
Deep learning. This branch of machine learning uses deep neural networks with many
hidden layers. Two of the most common types of deep neural networks are convolutional
and recursive. Convolutional neural networks are often used for recognizing images
by processing a hierarchy of featuresfor instance, making the connection between a
nose, a face, and eventually a full cat. This image recognition capability has important
applications in the development of autonomous vehicles, which need to recognize
their surroundings instantly. In contrast, recursive neural networks are used when
the overall sequence and context are important, as in speech recognition or natural
language processing.
Deep learning systems are the clearest example of the confluence of abundant data,
processing power, and increasingly sophisticated algorithms. Neural networks were
developed decades ago, but they lacked the massive quantities of data and processing
power needed to reach their full capabilities. Now that those barriers have been
The internet of things: Mapping the value beyond the hype, McKinsey Global Institute, June 2015.
18
Ensemble learning. This set of techniques uses multiple machine learning methods
to obtain better predictions than any one method could achieve on its own. Ensemble
methods are particularly useful when there is a wide range of possible hypotheses,
as they help to zero in on the most appropriate path. CareSkore, for example, is
employing ensemble learning using Googles TensorFlow platform to analyze a range of
sociodemographic and behavioral data with the goal of improving preventive health care.
These new techniques are made possible by new tools. Deep learning libraries and
platforms such as TensorFlow, Caffe, and Theano allow practitioners to integrate deep
learning algorithms into their analysis quickly and easily. Spark offers a big data platform
that provides advanced real-time and predictive analytics applications on widely used
Apache Hadoop distributed storage systems. New machine learning application program
interfaces (APIs), such as Microsofts ML API, enable users to implement machine learning
in new areas.
Greater computational power and storage capacity enable greater use of data
and analytics
40X
Over the past five years, computational power has continued to grow, reflecting the
continuing explanatory power of Moores Law. In 1993, computers could handle 4x10^9
instructions per second, and by the time our 2011 report was published, that figure had
increase in
grown by more than three orders of magnitude, to 6x10^12 instructions per second.20
processing power
Meanwhile, the amount of computing power each dollar can buy has increased by a factor
between fastest
of ten roughly every four years in the past quarter century, making cheap computing more
supercomputer of available than ever before.21 Continued investment is pushing the boundaries of these
2010 and the capabilities. In 2016, China unveiled the worlds fastest supercomputer, which is more than
fastest today 40 times as powerful as the fastest computer of 2010.22
Even as Moores Law is nearing its physical limitations, other innovations are fueling
continued progress. Computational power has gotten a boost from an unlikely source: video
game consoles. The need for computational power to power ever more sophisticated video
games has spurred the development of graphics processing units. GPUs have enabled
image processing in neural networks that is ten to 30 times as fast as what conventional
CPUs can do. These processors have been put to use in many non-gaming contexts,
ranging from Cortexicas image recognition programs to Salesforce.coms Twitter analysis.23
19
Lior Kuyer et al., Multiagent reinforcement learning for urban traffic control using coordination graphs,
Machine learning and knowledge discovery in databases, volume 5211 of Lecture Notes in Computer Science
series, Springer Berlin Heidelberg, 2008.
20
Martin Hilbert and Priscila Lpez, The worlds technology capacity to store, communicate, and compute
information, Science, volume 332, issue 6025, April 2011.
21
Luke Muehlhauser and Lila Rieber, Exponential and non-exponential trends in information technology,
Machine Intelligence Research Institute blog, May 12, 2014.
22
Top500 list of worlds supercomputers, June 2016, available at https://www.top500.org/list/2016/06/.
23
Shazam, Salesforce.com, Cortexica use GPUs for affordable, scalable growth in consumer and commercial
applications, Nvidia corporate press release, March 2013.
24 McKinsey Global Institute 1. The data and analytics revolution gains momentum
The existence of so much computational power in many hands has made new forms of
crowdsourcing possible though distributed computing, which takes advantage of a large
network of idle computers. Extremely time-consuming simulations of protein molecules
for medical research have been performed across a large number of volunteer computers
through Folding@home, a distributed computing project based out of Stanford Universitys
Pande Lab, for example.24
The growth of cloud-based platforms has given virtually any company the tools and storage
capacity to conduct advanced analytics. Cloud-based storage and analytics services
enable even small firms to store their data and process it on distributed servers. Companies
can purchase as much space as they need, greatly simplifying their data architecture and IT
requirements and lowering capital investment. As computation capacity and data storage
alike have largely been outsourced, many tools have become accessible, and data can now
be more easily combined across sources. NoSQL databases offer alternatives to relational
databases, allowing for the collection and storage of various types of unstructured data,
such as images, text, audio, and other rich media.
Data storage costs have fallen dramatically over the years. But now the increasing firehose
of data streams simply exceeds the amount of storage that exists in the worldand
projections indicate that the production of new storage capacity will continue to fall well
short of the demand created by explosive growth in data generation.25 Previous MGI
research found that more than 40percent of all data generated by sensors on the typical oil
rig were never even stored. This flood of data has to be captured and retained to be easily
accessed for analysis before it can yield value.
Data has become the new corporate asset classand the best way for companies to
generate and access it is to digitize everything they do. Digitizing customer interactions
provides a wealth of information for marketing, sales, and product development,
while internal digitization generates data that can be used to optimize operations and
improve productivity.
Looking at a wide variety of indicators that measure digitization, we see a striking gap
particularly in digital usage and laborbetween leading firms and the laggards. Digitization
tends to be correlated with data and analytics capabilities since it is a crucial enabler of data
generation and collection; it also requires related skills and mindsets as well as process and
infrastructure changes. Of the domains we reviewed in our 2011 research, information and
communications technology (the sector that has made the greatest advances in location-
based data), retail leaders, and advanced manufacturing rank among the leaders in MGIs
Industry Digitization Index. Meanwhile, government, basic manufacturing, the bulk of the
24
Vijay Pande, FAHs achievements in 2015, with a glimpse into 2016, Folding@home blog, December
6, 2015.
25
See, for example, Drew Robb, Are we running out of data storage space? World Economic Forum blog,
November 10, 2015, as well as Big data: The next frontier for innovation, competition, and productivity,
McKinsey Global Institute, June 2011.
Data permeates everything that the leading organizations do. Digitizing customer
interactions provides a wealth of information that can feed into strategy, marketing, sales,
and product development. Increasingly granular data allows companies to micro-target new
customers to acquire and to develop products and services that are more personalized.
Internal digitization generates data that can be used to make operations more efficient,
including better sourcing, supply-chain and logistics management, and predictive
maintenance on equipment.
The value of data and analytics has upended the traditional relationship between consumers
and producers. In the past, companies sold products to their customers in return for money
and negligible data. Today, transactionsand indeed every interaction with a consumer
produce valuable information. Sometimes the data itself is so valuable that companies such
as Facebook, LinkedIn, Pinterest, Twitter, and many others are willing to offer free services
in order to obtain it. In some cases, the customer is actually a user who barters his or her
data in exchange for free use of the product or service. In others, the actual customers may
be marketers that pay for targeted advertising based on user-generated data. To maintain
an edge in consumer data, user acquisition and user interaction are both critical. Venture
capitalists have long understood the importance of building a customer base. Many internet
startups from Quora to Jet have focused as much attention on capturing users who can
provide valuable data as on capturing paying customers.29
These technologies are allowing new players to challenge incumbents with surprising
speed since they circumvent the need to build traditional fixed assets. Amazon, Netflix,
Uber, Airbnb, and a host of new fintech financial firms have moved into industries where
incumbents were heavily invested in certain types of physical assets. These disrupters used
their digital, data, and analytics assets to create value without owning physical shops, cable
connections to viewers homes, car fleets, hotels, or bank branches, respectively. But even
as they bypassed traditional barriers to entry, they have erected new ones. The network
effects of digital marketplaces, social networks, and other digital platforms can create a
winner-take-most phenomenon. The leading platforms capture a disproportionate share of
the data created in a given space, making it difficult for new entrants to challenge them.
26
Digital America: A tale of the haves and have-mores, McKinsey Global Institute, December 2015; and Digital
Europe: Pushing the frontier, capturing the benefits, McKinsey Global Institute, June 2016.
27
Michael Chui and James Manyika, Competition at the digital edge: Hyperscale businesses, McKinsey
Quarterly, March 2015.
28
Jacques Bughin, Big data: Getting a better read on performance, McKinsey Quarterly, February 2016.
29
Maya Kosoff, These 15 startups didnt exist five years agonow theyre worth billions, Business Insider,
December 10, 2015.
26 McKinsey Global Institute 1. The data and analytics revolution gains momentum
The leading firms have a remarkable depth of analytical talent deployed on a variety
of problemsand they are actively looking for ways to make radical moves into other
industries. These companies can take advantage of their scale and data insights to add
new business lines, and those expansions are increasingly blurring traditional sector
boundaries.30 Alphabet, which used its algorithmic advantage to push Google ahead of
older search engine competitors, is now using its formidable talent base to expand into
new business lines such as the development of autonomous vehicles. Apple used its
unique data, infrastructure edge, and product platform to push into the world of finance
with Apple Pay. Similarly, Chinese e-commerce giants Alibaba, Tencent, and JD.com have
leveraged their data volumes to offer microloans to the merchants that operate on their
platforms. By using real-time data on the merchants transactions to build its own credit
scoring system, Alibabas finance arm was able to achieve better non-performing loan ratios
than traditional banks.31 Furthermore, banks and telecom companies are sharing data to
drive new products and to improve core operations such as credit underwriting, customer
segmentation, and risk and fraud management.
To keep up with the pace of change, incumbents need to look through two lenses at the
same time. In an environment of constant churn, organizations should have one eye on
high-risk, high-reward moves of their own. They have to keep thinking ahead, whether that
means entering new markets or changing their business models. At the same time, they
have to maintain a focus on using data and analytics to improve their core business. This
may involve identifying specific use cases and applications on the revenue and the cost
side, using analytics insights to streamline internal processes, and building mechanisms for
constant learning and feedback to improve. Pursuing this two-part strategy should position
any organization to take advantage of opportunities and thwart potential disruptors. See
Chapter2 for further discussion of the steps involved in transforming a traditional company
into a more data-driven enterprise.
With data and analytics technology rapidly advancing, the next question is how companies
will integrate these capabilities into their operations and strategiesand how they will
position themselves in a world where analytics capabilities are rapidly reshaping industry
competition. But adapting to an era of more data-driven decision making is not always a
simple proposition for people or organizations. Even some companies that have invested
in data and analytics capabilities are still struggling with how to capture value from the data
they are gathering. The next chapter revisits our 2011 research. It estimates how much
progress has been made in the five domains where we previously highlighted tremendous
potentialand points to even wider opportunities that exist today to create value
through analytics.
Playing to win: The new global competition for corporate profits, McKinsey Global Institute, September 2015.
30
Chinas digital transformation: The internets impact on productivity and growth, McKinsey Global Institute,
31
July 2014.
28 McKinsey Global Institute 1. The data and analytics revolution gains momentum
2. OPPORTUNITIES
STILLUNCAPTURED
This chapter revisits the five domains we highlighted in our 2011 report to evaluate their
progress toward capturing the potential value that data and analytics can deliver. In each
of these areas, our previous research outlined dozens of avenues for boosting productivity,
expanding into new markets, and improving decision making. Below we examine how much
of that value is actually being captured, combining quantitative data with input from industry
experts. The numerical estimates of progress in this chapter provide an indication of which
areas have the greatest momentum and where barriers have proven to be more formidable,
although we acknowledge they are directional rather than precise.
We see the greatest progress in location-based services and in retail, where competition
from digital native firms has pushed other players toward adoption (Exhibit1). The thin
margins facing retailers (especially in the grocery sector) and pressure from competitors
such as Amazon and leading big-box retailers such as Walmart create a strong incentive
to evolve. In contrast, manufacturing, the public sector, and health care have captured less
than a third of the value opportunities that data and analytics presented five years ago.
Overall, many of the opportunities described in our 2011 report are still on the table. In the
meantime, the potential for value creation has grown even bigger.
Exhibit 1
There has been uneven progress in capturing value from data and analytics
Value captured
Potential impact: 2011 research % Major barriers
Manufacturing2 Up to 50% lower product development cost Siloed data in legacy IT systems
Up to 25% lower operating cost 20 Leadership skeptical of impact
Up to 30% gross margin increase 30
US health care $300 billion value per year Need to demonstrate clinical
~0.7% annual productivity growth 10 utility to gain acceptance
20 Interoperability and data sharing
DUPLICATE from ES
Most business leaders recognize the size of the opportunities and feel the pressure to
evolve. Recent research has found that investing in data and analytics capabilities has high
returns, on average: firms can use these capabilities to achieve productivity gains of 6 to
8percent, which translates into returns roughly doubling their investment within a decade.
This is a higher rate of return than other recent technologies have yielded, surpassing even
the computer investment cycle in the 1980s.32
However, these high returns are largely driven by only a few successful organizations.
Early adopters are posting faster growth in operating profits, which in turn enables them
to continue investing in data assets and analytics capabilities, solidifying their advantages.
Facebook, in particular, has created a platform capable of gathering remarkably detailed
data on billions of individual users. But not all of the leaders are digital natives. Walmart,
GE, Ferrari F1, and Union Pacific are examples of companies in traditional industries whose
investments in data and analytics have paid significant dividends on both the revenue and
cost sides.
Many other companies are lagging behind in multiple dimensions of data and analytics
transformationand the barriers are mostly organizational issues. The first challenge is
incorporating data and analytics into a core strategic vision. The next step is developing
the right business processes and building capabilities (including both data infrastructure
and talent); it is not enough to simply layer powerful technology systems on top of existing
business operations. All these aspects of transformation need to come together to realize
the full potential of data and analyticsand the challenges incumbents face in pulling this off
are precisely why much of the value we highlighted in 2011 is still unclaimed.
We estimate that some 50 to 60percent of the potential value anticipated in our 2011
research from location-based services has already been captured. We looked separately at
the revenue generated by service providers and the value created for consumers. Our 2011
report estimated that service providers had roughly $96billion to $106billion in revenue
at stake from three major sources: GPS navigation devices and services, mobile phone
location-based service applications, and geo-targeted mobile advertising services. Todays
market has already reached 60percent of that revenue estimate, with particularly strong
growth in the use of GPS navigation. Industries and consumers alike have embraced real-
time navigation, which is now embedded in a host of services that monetize this technology
in new ways. Uber and Lyft use location data for their car dispatching algorithms, and online
real estate platforms such as Redfin embed street views and neighborhood navigation in
their mobile apps to aid home buyers.
Our 2011 analysis estimated that end consumers would capture the equivalent of more than
$600billion in value, which is the lions share of the benefits these services create. The world
is at the tipping point at which smartphones account for most mobile phone subscriptions
(although most people in the world still do not own mobile phones).33 The share is much
higher in advanced economies and is rising rapidly worldwide. This trend puts mapping
32
Jacques Bughin, Ten lessons learned from big data analytics, Journal of Applied Marketing
Analytics, forthcoming.
33
451 Research data. See also Ericsson mobility report: On the pulse of the networked society, Ericsson,
June 2016.
Yet the opportunities have grown beyond what we envisioned in our 2011 report. Today
there are new and growing opportunities for businesses in any industry to use geospatial
data to track assets, teams, and customers in dispersed locations in order to generate new
insights and improve efficiency. These opportunities are significant and, while still in the very
early stages, could turn out to be even larger than the ones discussed above.
30-40%
While our 2011 analysis focused on the US sector alone, these opportunities clearly exist
in other high-income countries as well, especially in Europe. Moreover, the incentives for
adoption are there; major retailers worldwide have been early adopters of analytics as
potential value
they seek to respond to the competitive pressures created by e-commerce. But despite
captured in the
the improvements made possible by analytics, overall margins have remained thin (the
retail sector
earnings before interest, taxes, and amortization, or EBITA, margins held steady around
7 to 8percent from 2011 to 2016).34 This is because a great deal of the value has gone to
consumers, who have been the major beneficiaries of intense competition in the retail world.
Capabilities are also uneven across the industry. Big-box retailers such as Target, Best
Buy, and Costco have invested in creating an end-to-end view of their entire value chain,
from suppliers to warehouses to stores to customers. Real-time information from its stores
allowed Globus, a department store chain in Switzerland, to update its product mix and
respond quickly to customer demand.35 In addition, certain subsectors have made faster
progress. The grocery sector has led the way, while smaller retailers specializing in clothing,
furnishings, and accessories have lagged behind. Organizational hurdles, including
the difficulty of finding data scientists and breaking down information silos across large
companies, have kept many companies from realizing the full potential payoff.
Our 2011 report focused on integrating analytics into five key functions: marketing,
merchandising, operations, supply-chain management, and new business models. We have
seen fairly even progress across all of these, with most large retailers adopting at least basic
analytics to optimize their operations and supply chains.
In marketing and sales, the biggest emphasis has been on improved cross-selling, including
next product to buy recommendations. While Amazon pioneered this use of technology,
many other retailers (including Nordstrom, Macys, and Gilt) now make recommendations
based on user data. In addition, retailers have tested everything from location-based ads
to social media analysis; Target, for instance, is piloting the use of beacons that transmit
targeted in-store ads depending on a shoppers precise position.36 Within merchandising,
retailers have made strides in optimizing their pricing (especially online) and assortment,
but they have not brought as much technology to bear on placement. Amazon is mining
34
Based on a sample of a dozen large global retail companies from their annual reports.
35
Real-time enterprise stories, case studies from Bloomberg Businessweek Research Services and Forbes
Insights, SAP, October 2014.
36
Sarah Perez, Target launches beacon test in 50 stores, will expand nationwide later this year, TechCrunch,
August 5, 2015.
20-30%
Manufacturing industries have captured only about 20 to 30percent of the potential value
we estimated in our 2011 researchand most of that has gone to a handful of industry-
leading companies. Those that made decisive investments in analytics capabilities have
potential value
often generated impact in line with our estimates. The sectors main barrier seems to
captured in the
be the perception of many companies that the complexity and cost of analytics could
manufacturing
outweigh the potential gains, particularly if the companies have difficulty identifying the right
sector technology and talent. There is no single integrated system that is a clear choice for every
company. Many will not solve the full problem of data being cordoned off in silos across an
organization, and installing replacement systems is a difficult undertaking.
Our 2011 report highlighted opportunities for the global manufacturing sector to realize value
from data and analytics within R&D, supply-chain management, production, and after-
sales support. Within R&D, design-to-value (the use of customer and supplier data to refine
existing designs and feed into new product development) has had the greatest uptick in
adoption, particularly among carmakers. While adoption of advanced demand forecasting
and supply planning has been limited, there are some individual success stories. One
stamping parts producer was able to save approximately 15percent on product costs by
using these types of insights to optimize its production footprint.
Within the actual production process, the greatest advances have been in developing
digital models of the entire production process. Industry leaders such as Siemens, GE,
and Schneider Electric have used these digital factories to optimize operations and shop
floor layout, though this technique often focuses on designing new facilities. Furthermore,
throughout ongoing production processes, many early adopters are using sensor data
to reduce operating costs by some 5 to 15percent. Data-driven feedback in after-sales
has been most heavily applied within servicing offers, especially in aerospace or large
installations in business-to-business transactions. After-sales servicing offers once relied
on simple monitoring, but now they are beginning to be based on real-time surveillance and
predictive maintenance.
10-20%
Our 2011 report analyzed the public sector in the European Union (EU), where we outlined
some 250billion worth of annual savings that could be achieved by making government
services more efficient, reducing fraud and errors in transfer payments, and improving tax
potential value
collection. But little of this has materialized, as EU agencies have captured only about 10 to
captured by the
20percent of this potential.
EU public sector
In terms of operations, some government entities have moved more interactions online,
and many (particularly tax agencies) have adopted more pre-filled forms. There is also a
movement to improve data sharing across agencies, exemplified in the EU initiative called
Tell It Once. On a country-specific level, the Netherlands has moved most tax and social
welfare functions online, while France saved the equivalent of 0.4percent of GDP by
reducing requests from agencies to citizens for the same type of information from 12 to one.
Adoption of algorithms to detect fraud and errors in transfer payments has been limited.
Analytics have been used to improve the rate of tax collection, mainly by targeting tax audits
more effectively and running algorithms on submitted tax forms. France has automated
While our 2011 report focused on the EU public sector, these observations regarding
government adoption are applicable across all high-income economies. Adoption and
capabilities generally vary greatly from country to country, and even among agencies (with
tax agencies typically being the most advanced within a given country). The main barriers
holding back progress have been organizational issues, the deeply entrenched nature and
complexity of existing agency systems, and the difficulty of attracting scarce analytics talent
with public-sector salaries.
10-20%
Our 2011 report outlined $300billion worth of value that big data analytics could unlock in
the US health-care sector. To date, only 10 to 20percent of this value has been realized.
Making a major shift in how data is used is no easy task in a sector that is not only highly
potential value
regulated but often lacks strong incentives to promote increased usage of analytics. A range
captured by the US
of barriersincluding a lack of process and organizational change, a shortage of technical
health-care sector
talent, data-sharing challenges, and regulationshave combined to limit the impact of data
and analytics throughout the sector and constrain many of the changes we envisioned.
The opportunities we highlighted were split among five categories: clinical operations,
accounting and pricing, R&D, new business models, and public health. Within clinical
operations, the major success has been the rapid expansion of electronic medical records
(EMRs), which accounted for 15.6percent of all records in 2010 but more than 75percent
by 2014, aided by the incentives for providers in the Affordable Care Act.37 This has
enabled basic analytics but little has been done to unlock and fully utilize the vast stores of
data actually contained within EMRs. A few providers have pushed this further, including
Sutter Health, whose new EMR system processes reports 40 times faster and achieves
an 87percent increase in predicting readmissions compared with its previous system, by
centralizing the data and analytics and pushing toward prospective analyses.38
Payers have also been slow to capitalize on big data for accounting and pricing, but a few
encouraging trends have emerged. Transparency in health-care pricing has improved
thanks to steps taken by the Centers for Medicare and Medicaid Services at the national
level, while more than 30 states have established all-payer claims databases to serve
as large-scale repositories of pricing information. A few insurers have made gains.
Optum within UnitedHealth saves employers money by combing claim records for over-
prescriptions.
Greater progress has been made in the pharmaceutical industry, where many companies
have adopted analytics to assist their R&D, although they are still in the early stages of
putting the full capabilities to work. Most pharma companies now use predictive modeling to
optimize dosing as they move from animal testing to phase I clinical trials, but analytics have
not yet been used as widely in later trials to determine questions such as the proper efficacy
window and patient exclusion criteria. Data are being used in R&D to identify the right target
population for drug development, which can reduce the time and cost of clinical trials by
10 to 15percent. Contract research organizations, which are used more widely today than
even five years ago, generally use statistical tools to improve the administration of clinical
37
Dustin Charles, Meghan Gabriel, and Talisha Searcy, Adoption of electronic health record systems among
US non-federal acute care hospitals: 20082014, Office of the National Coordinator for Health Information
Technology, data brief number 23, April 2015.
38
Healthcare providers unlock value of on-demand patient data with SAP HANA, SAP press release, May
4, 2016.
Some of the new models we highlighted in 2011 are in fact taking root. There is now a
large and growing industry that aggregates and synthesizes clinical records. Explorys, for
example, a data aggregator with some 40million EMRs, was recently acquired by IBM to
support the development of Watson. However, online platforms and communities (such as
PatientsLikeMe) had strong initial success as key data sources, but other sources have also
appeared. The use of analytics in public health surveillance has assumed new importance,
given recent outbreaks of the Ebola and Zika viruses.
The health-care sector may have a long way to go toward integrating data and analytics.
But in the meantime, the possibilities have grown much bigger than what we envisioned
just five short years ago. Cutting-edge technology will take time to diffuse throughout
the health-care system, but the use of machine learning to assist in diagnosis and clinical
decision making has the potential to reshape patient care. Advances in deep learning in the
near future, especially in natural language and vision, could help to automate many activities
in the medical field, leading to significant labor cost savings. With labor constituting 60 to
70percent of hospital total costs, this presents significant opportunities in the future.
1/10TH
The biggest frontier for data analytics in health care is the potential to launch a new era
of truly personalized medicine (see Chapter4 for a deeper discussion of what this could
entail). New technologies have continued to push down the costs of genome sequencing
the cost of
from $10,000 in 2011 to approximately $1,000 today.39 Combining this with the advent of
sequencing a
proteomics (the study of proteins) has created a huge amount of new biological data. To
genome today as
date, the focus has been largely within oncology, as genomics has enabled characterization
a share of the cost of the microsegments of each type of cancer.
in 2011
A NUMBER OF BARRIERS STILL NEED TO BE OVERCOME
What explains the relatively slow pace of adoption and value capture in these domains and
many others? Below we look at some of the internal and external barriers organizations face
as they try to shift to a more data-driven way of doing business.
An effective transformation strategy can be broken down into several elements (Exhibit 2).
The first is stepping back to ask some fundamental questions that can shape the strategic
vision: What will data and analytics be used for? How will the insights drive value? How
will the value be measured? The second component is building out the underlying data
architecture as well as data collection or generation capabilities. Many incumbents struggle
with switching from legacy data systems to a more nimble and flexible architecture to
store and harness big data; they may also need to complete the process of fully digitizing
transactions and processes in order to collect all the data that could be useful. The
third element is acquiring the analytics capabilities needed to derive insights from data;
Data from the NHGRI Genome Sequencing Program, National Human Genome Research Center, available at
39
https://www.genome.gov/sequencingcostsdata/.
Exhibit 2
Data modeling
Internal black box Process redesign Capability building
USE
CASES/ DATA MODELING WORKFLOW
ADOPTION
SOURCES ECOSYSTEM INSIGHTS INTEGRATION
OF VALUE
External Heuristic insights Tech enablement Change management
smart box
Clearly articulating Gathering data from Applying linear and Redesigning Building frontline
the business need internal systems and nonlinear modeling processes and management
and projected impact external sources to derive new Developing an capabilities
Outlining a clear Appending key insights intuitive user Proactively
vision of how the external data Codifying and testing interface that is managing change
business would use Creating an analytic heuristics integrated into day- and tracking
the solution sandbox across the to-day workflow adoption
Enhancing data organization Automating with performance
(deriving new (informing predictor workflows indicators
predictor variables) variables)
Failing to execute these steps well can limit the potential value. Digital native companies have
a huge natural advantage in these areas. It is harder for traditional companies to overhaul or
DUPLICATE from ES
change existing systems, but hesitating to get started can leave them vulnerable to being
disrupted. And while it may be a difficult transition, some long-established companies
including GE, Mercedes-Benz, Ferrari F1, and Union Pacifichave managed to pull it off.
(See Box 1, Identifying the most critical internal barriers for organizations.)
Exhibit 3
Survey respondents report that strategic, leadership, and organizational hurdles often determine the degree to
which they can use data and analytics effectively
Which of these have been among the TOP 3 most significant challenges to your organization's pursuit of its data
and analytics objectives?
Constructing a strategy
30
Strategy,
leadership,
Ensuring senior management involvement
42
Securing internal leadership for data and analytics
and talent
projects
33
Attracting and/or retaining appropriate talent (both
functional and technical)
21
Tracking the business impact of data and analytics
Organi- activities
23
zational
Designing an appropriate organizational structure
structure
and
to support data and analytics activities
45
processes Creating flexibility in existing processes to take
advantage of data-driven insights
13
Providing business functions with access to
support
14
IT infra-
structure
Investing at scale
17
Designing effective data architecture and
technology infrastructure
36
SOURCE: McKinsey Global Institute analysis
1
The need to lead in data and analytics, McKinsey & Company survey, McKinsey.com,
April 2016, available at http://www.mckinsey.com/business-functions. The online survey,
conducted in September 2015, garnered responses from more than 500 executives across a
variety of regions, industries, and company sizes.
The talent needed to execute the leadership vision is in high demand. In fact,
approximately half of executives across geographies and industries reported
greater difficulty recruiting analytical talent than any other kind of talent.
Fortypercent say retention is also an issue. Business translators who can
bridge between analytics and other functions were reported to be the most
difficult to find, followed by data scientists and engineers. Talent scarcity is a
major concern that we discuss in greater detail later in this chapter.
Respondents who said their companies had made ineffective use of analytics
noted that their biggest challenge was designing the right organizational
structure to support it. This needs to include tracking the business impact
and making existing processes flexible enough to respond to new data-driven
insights. Firms may be comfortable with using analytics in certain areas, but
for many, those changes have not filtered through the entire organization.
Our 2011 report hypothesized that the demand for data scientists in the United States alone
could far exceed the availability of workers with these valuable skills.40 Since then, the labor
market has borne out this hypothesis. As a result of soaring demand for data scientists,
their average wages rose by approximately 16percent per year from 2012 to 2014.41 This
far outstrips the less than 2percent increase in the nominal average salary across all
occupations in Bureau of Labor Statistics data. Top performers with a very scarce skill set,
such as deep learning, can command very high salaries. Glassdoor.com lists data scientist
as the best job in 2016 based on number of job openings, salary, and career opportunities.42
LinkedIn reports that the ability to do statistical analysis and data mining is one of the most
sought-after skills of 2016.43
Roles for data scientists are becoming more specialized. On one end of the spectrum are
data scientists who research and advance the most cutting-edge algorithms themselves
and this elite group likely numbers fewer than 1,000 people globally. At the other end are
data scientists working closer to business uses and developing more practical firm-specific
insights and applications.
The scarcity of elite data scientists has even become a factor in some acquisitions.
Google, for example, acquired DeepMind Technologies in 2014, at an estimated price of
$500million. With approximately 75 DeepMind employees at the time of the deal, the price
tag was nearly $7million per employee.44 This is in line with other estimates by experts,
who say that aqui-hires of cutting-edge AI startups cost around $5million to $10million
per employee. In this case, the DeepMind acquisition resulted in the development of
AlphaGo, which became the first AI program to defeat a human professional player in the
game of Go.45 It also reportedly enabled Google to reduce the cooling costs for its vast data
centers by 40percent, saving several hundredmillion dollars per year.46 The DeepMind
acquisition could pay off for Google from just this one application alone.
The supply side has been responding to the growing demand for analytics talent. In the
United States, students are flocking to programs emphasizing data and analytics. The
number of graduates with degrees of all levels in these fields grew by 7.5percent from 2010
to 2015, compared with 2.4percent growth in all other areas of study. Universities are also
40
In our 2011 analysis, this role was referred to as deep analytical talent.
41
Beyond the talent shortage: How tech candidates search for jobs, Indeed.com, September 2015.
42
25 best jobs in America, Glassdoor.com blog, available at https://www.glassdoor.com/List/Best-Jobs-in-
America-LST_KQ0,20.htm
43
The 25 skills that can get you hired in 2016, LinkedIn official blog, January 2016, available at https://blog.
linkedin.com/2016/01/12/the-25-skills-that-can-get-you-hired-in-2016.
44
Catherine Shu, Google acquires artificial intelligence startup DeepMind for more than $500million,
TechCrunch, January 26, 2014.
45
See, for example, Christof Koch, How the computer beat the Go master, Scientific American, March 19,
2016. This achievement was widely regarded as a seminal moment in advancing artificial intelligence.
46
See DeepMind corporate blog at https://deepmind.com/applied/deepmind-for-google/.
In the short run, however, even this robust growth in supply is likely to leave some
companies scrambling. It would be insufficient to meet the 12percent annual growth in
demand that could result in the most aggressive case that we modeled (Exhibit4). This
scenario would produce a shortfall of roughly 250,000 data scientists. As a result, we expect
to see salaries for data scientists continue to grow. However, one trend could mitigate
demand in the medium term: the possibility that some part of the activities performed
by data scientists may become automated. More than 50percent of the average data
scientists work is data preparation, including cleaning and structuring data. As data tools
improve, they could perform a significant portion of these activities, potentially helping to
ease the demand for data scientists within ten years.
Exhibit 4
The expected number of trained data scientists would not be sufficient to meet demand in a high-case scenario
736
314 High
66 483
248
235
403 Low
1 The calculation is across all Standard Occupational Classifications except the ones that are clear false positive hits.
2 2014 fraction per occupation times the 2014 BLS EP employment per occupation, assuming the job market is in equilibrium and supply equals demand.
Including 355 occupations.
3 Graduates from US universities during ten years who are estimated to have the skill set required to be data scientists. Includes removing the retirement of
2.1% from current supply.
4 For each industry we calculate the share individually for the top five US companies based on their market capitalization. Professional services is an
exception; there we use consulting companies irrespective of their home country.
NOTE: Numbers may not sum due to rounding.
SOURCE: US Bureau of Labor Statistics; Burning Glass; McKinsey Global Institute analysis
On a broader scale, multiple initiatives at the state, national, and international levels aim to
develop analytical talent. Examples include the Open Data Institute and the Alan Turing
Institute in the United Kingdom; the latter functions also as an incubator for data-driven
startups. The European Commission launched a big data strategy in 2014. The United
Many organizations focus on the need for data scientists, assuming their presence alone
constitutes an analytics transformation. But another equally vital role is that of the business
translator who can serve as the link between analytical talent and practical applications to
business questions. In some ways, this role determines where the investment ultimately
pays off since it is focused on converting analytics into insights and actionable steps.
In addition to being data savvy, business translators need to have deep organizational
knowledge and industry or functional expertise. This enables them to ask the data
science team the right questions and to derive the right insights from their findings. It may
be possible to outsource analytics activities, but business translator roles need to be
deeply embedded into the organization since they require proprietary knowledge. Many
organizations are building these capabilities from within.
2M-4M
The ratio of business translators to data scientists needed in a given company depends
heavily on how the organization is set up and the number and complexity of the uses the
company envisions. But averaging across various contexts, we estimate there will be
projected demand
demand for approximately twomillion to fourmillion business translators over the next
for business
decade. Given that about 9.5million STEM and business graduates are expected in the
translators over the
United States over this period, approximately 20 to 40percent of these graduates would
next decade need to go into business translator roles to meet demand (though people from other fields
can also become business translators). That seems quite aspirational given that today, only
some 10percent of STEM/business graduates go into business translator roles. Two trends
could bring supply in line with potential future demand: wages for business translators may
have to increase, or more companies will need to implement their own training programs.
Some are already doing so, since this role requires a combination of skill sets that is
extremely difficult for most companies to find in external hires.47
Visualization is an important step in turning data into insight and value, and we estimate
that demand for this skill has grown roughly 50percent annually from 2010 to 2015.48
Since this is a fairly new development, demand does not always manifest in a specific role.
In many instances today, organizations are seeking data scientist or business translator
candidates who can also execute data visualization. But we expect that medium-size
and large organizations, as well as analytics service providers, will increasingly create
specialized positions.
Three trends are driving demand for data visualization skills. First, as data becomes
increasingly complex, distilling it is all the more critical to help make the results of data
analyses digestible for decision makers. Second, real-time and near-real-time data are
becoming more prevalent, and organizations and teams need dynamic dashboards rather
than reports. Third, data is increasingly required for decision making through all parts of an
organization, and good visualization supports that goal, bringing the information to life in a
way that can be understood by those who are new to analytics. New software enables users
to make clear and intuitive visualizations from simpler data. But more complex dashboards
and data-driven products can require specialized designers. Those who combine a strong
understanding of data with user interface/user experience and graphic design skills can play
a valuable role in most organizations.
47
Sam Ransbotham, David Kiron, and Pamela Kirk Prentice, The talent dividend: Analytics talent is driving
competitive advantage at data-oriented companies, MIT Sloan Management Review, April 25, 2015.
48
Based on using the Burning Glass job postings database to search for postings including any of the following
skills: data visualization, Tableau, Qlikview, and Spotfire. Normalized with the total number of job postings.
In some cases, there are still disincentives to share data. In health care, for example,
providers and pharmaceutical companies could stand to lose from greater data sharing
with payers. Perhaps the largest hurdle in data access is the need for new data sources to
rapidly demonstrate their profit-making potential. Many new data sets are being created in
personal health, such as those captured by wearable sensors, but these data sets have yet
to demonstrate clinical utility. Given industry dynamics and reimbursement policies, they
may experience slow usage and uptake.
Three major concerns continue to challenge the private sector as well as policy makers:
privacy, cybersecurity, and liability. All of these can discourage the use of analytics. Privacy
issues have been front and center in the European Union, where right to be forgotten
legislation has required internet companies to take extra steps to clean their records, and
citizens have a constitutional right to access data about themselves, even when held by
private companies. Meanwhile, privacy concerns have been heightened by repeated
cybersecurity breaches. Widely publicized breaches have had major ramifications for
companies relationships with their customers. Many people remain wary about big
brotherstyle surveillance by both companies and governments. Customers have reacted
negatively to retailers tracking their movements in stores, for example.51 And lastly, liability
frameworks surrounding the use of data and analytics still need to be clarified. In health care,
for example, clinical validation can be a lengthy process, and deviating from established
guidelines can put physicians or companies at risk of a lawsuit. These concerns will only
grow as complicated algorithms play a larger role in decision making, from autonomous
driving to deciding where law enforcement resources should be deployed.
Beyond the impact within individual sectors, the soaring demand for data and analytics
services has created complex ecosystems. Data may take a long and complex journey
from their initial collection to their ultimate business useand many players are finding
ways to monetize data and add value at points along the way. The next chapter examines
these ecosystems in greater detail to identify some of these opportunities in this rapidly
evolving landscape.
49
Open data: Unlocking innovation and performance with liquid information, McKinsey Global Institute,
October 2013.
50
The internet of things: Mapping the value beyond the hype, McKinsey Global Institute, June 2015.
51
Stephanie Clifford and Quentin Hardy, Attention, shoppers: Store is tracking your cell, The New York Times,
July 14, 2013.
But how much is all this data worth? Datas value comes down to how unique it is and how
it will be used, and by whom. Understanding the value in all these small bits of information
that need to be gathered, sifted, and analyzed is a tricky proposition, particularly since
organizations cannot nail down the value of data until they are able to clearly specify its
uses, either immediate or potential. The data may yield nothing, or it may yield the key to
launching a new product line or making a scientific breakthrough. It might affect only a small
percentage of a companys revenue today, but it could be a key driver of growth in the future.
Many organizations see this potential and are hungry to use data to grow and improve
performanceand multiple players are seizing the market opportunities created by this
explosion in demand. There are many steps between raw data and actual application of
data-derived insights, and there are openings to monetize and add value at many points
along the way. As a result, complex data ecosystems have been rapidly evolving.
The biggest opportunities within these ecosystems are unlikely to be in data generation
alone, since raw or slightly processed data are usually many steps away from their ultimate,
practical use. Furthermore, as data become easier to collect and as storage costs go down,
many types of data will become increasingly commoditized, except in certain contexts,
where supply is constrained or the data are uniquely suited for high-value uses.
Aggregating information from different sources is critical when a large volume of data is
needed or when combining complementary data can lead to new insights. But new tools
are allowing end-users to perform this function themselves. Over the longer term, we believe
aggregation services will become more valuable only in cases where there are significant
barriers to combining data from multiple sourcesfor example, if a truly massive volume of
data is required, if aggregation poses major technical challenges, or if an independent third
party is required for coordination.
The most lucrative niches for the future appear to be based in analytics. Previous MGI
research has noted that profits are shifting to idea- and knowledge-intensive arenas,
including analytical software and algorithms.52 While companies are often uncertain about
what to do with raw data, they are willing to pay for insights that are more readily applicable
to strategy, sales, or operations. As organizations become more sophisticated, they are
likely to continue devising new ways to collect dataand the demand for insights and
analytics will only increase. Since analytics is most effective when combined with deep
industry and functional expertise, we expect to see a growing number of specialized players.
Furthermore, as the volume of data continues to grow, the ability to separate real insights
from the noise will be a source of value. With so much demand, and a scarcity of talent likely
to persist in the medium term, firms from throughout the ecosystem are scrambling to take a
piece of the analysis market.
Playing to win: The new global competition for corporate profits, McKinsey Global Institute, September 2015.
52
NOT ALL DATA ARE CREATED EQUALAND THEIR VALUE DEPENDS ON THEIR
UNIQUENESS AND END USES
Data has clearly become an important corporate assetand business leaders want to know
how to measure and value the information they hold. Many are asking how much they could
earn by selling data to others. But the value of data depends on how they will be used and
who will use them. The same piece of data could have different value for different users,
depending on their respective economics.
Data have several characteristics that make them unique as an asset. The first is their
non-rivalrous naturethat is, the same piece of data can be used by more than one party
simultaneously. This makes data similar to other intangible assets such as knowledge and
intellectual property. But few organizations list data or information assets on their balance
sheets.53 Most data is monetized indirectly (for example, through selling analytics as a
service based on data) rather than through direct sale, which makes its value difficult to
disaggregate from other elements in an offering. In some cases, data can be used for barter,
which also makes the underlying price tricky to calculate. For example, when a customer
signs up for a loyalty card to get discounts or when an individual uses Facebook or Google
for free, the customer, wittingly or not, trades personal data for services.
Another important characteristic of data is its sheer diversity. Some of the broad
categories include behavioral data (capturing actions in digital and physical environments),
transactional data (records of business dealings), ambient or environmental data (conditions
in the physical world monitored and captured by sensors), reference material or knowledge
(news stories, textbooks, reference works, literature, and the like), and public records. Some
data are structured, while images, audio, and video are unstructured. Data can come from
a diversity of sources, such as the web, social media, industrial sensors, payment systems,
cameras, wearable devices, and human entry. Each of these features relates to how the
data are generated and therefore affect their supply.
On the demand side, the value of data depends on how the data will (or could) ultimately be
used. Sometimes the value of a piece of data is known because it is directed to a specific,
clear purposesuch as when an advertiser purchases information on TV ratings, or when
a political campaign purchases an email list from an advocacy group to reach potential
voters. An online retailer can measure the difference in conversion from a generic Facebook
ad versus one that is based on the customers browsing history on the retailers website. In
other cases, organizations are unsure about how and where data could be put to work. For
instance, vast volumes of data exhaust are generated by 30,000 sensors on a modern oil
rig, but less than 1percent is used in decision making.54 But one organizations data exhaust
could be another organizations data gold. Only by understanding the potential uses of data
by individual players can the value of data be determined.
53
In the case of valuation in merger or acquisition, the value of the targets data assets may partially be recorded
as databases under intangible assets, and part of the value is captured as goodwill.
54
The Internet of Things: Mapping the value beyond the hype, McKinsey Global Institute, June 2015.
Internal cost and revenue optimization: The potential applications here are numerous.
On the cost side, data can be put to work in predictive maintenance, talent and process
management, procurement, and supply chain and logistics planning. On the revenue
side, insights from data can be used to enter new markets, micro-target customer
segments, improve product features, and make distribution channels more effective.
Data derived from machines and processes, especially from IoT sensors and from
customer behavior and transactions, are most useful for optimizing operations. While the
data may be generated internally, there can be opportunities for service providers, since
analyzing data is not yet a core competency for many firms in traditional industries. But
other companies will build the internal capability to take advantage of their own data to
improve their operations.
Market-making: Market-making firms, from ride-sharing apps to dating sites, play the
role of matching the needs of buyers and sellers. These firms often create platforms to
collect the necessary information to enable efficient and effective matching. In some
cases, pure signaling data is all that is needed. But in other cases, preference data,
reputation data (to validate the authenticity and quality of participants), and transaction
and behavior data are crucial. Scale and network effects are important here, as buyers
and sellers demand a marketplace that is liquid enough to match their needs efficiently.
Training data for artificial intelligence: Machine learning and deep learning require
huge quantities of training data. Some is generated through repeated simulations (such
as game playing), some is generated in the public sphere (such as mapping and weather
data), and other data is aggregated from a diversity of sources (such as images and
video or customer behavior). Public sources and private aggregators can play a crucial
role here. Firms that produce huge quantities of relevant data with their own platform
have a head startand first movers may have an advantage, as their offerings will have
more time to learn and generate additional data, fueling the virtuous cycle. However,
because there is a great variety of potential uses for different users, valuing data here can
be challenging.
These ecosystems may overlap. In some cases, the same piece of data (such as customer
behavior data) can have multiple applications, each with a different value.
Data generation and collection: The source and platform where data are
initially captured.
Data aggregation: Process and platforms for combining data from multiple sources.
Data analysis: The gleaning of insights from data that can be acted upon.
Data infrastructure: The hardware and software associated with data management.
We recognize that a diverse landscape of infrastructure providers offers the hardware
and software necessary to execute on the collection, aggregation, and analysis activities
described, but that portion of the ecosystem is not the focus of our discussion.
To give one example of how different types of data move through each part of an
ecosystem, consider what happens when a consumer applies for a credit card. Consumers
generate data when they use and make payments on their existing financial products,
including mortgages, credit cards, and other lending accounts. The various financial
institutions that hold these accounts collect, organize, and summarize this information
generated by the consumers behavior as a borrower. These entities then share these
summary data with credit bureaus, which play the aggregator role by combining them with
data from other entities that report late payments (such as collection agencies for medical
or utility bills) and from public records such as bankruptcy filings. The credit bureaus
are then able to form a more complete view of the customers credit behavior, and they
apply analytics to generate a credit score. A financial institution considering a consumers
application for a card will pay credit bureaus for access to the score and the full credit report
to inform its own decision-making models and determine whether to issue the new card.
In this case, the generator of the data (the consumer) does not necessarily own the data.
The consumers agreements with various lenders (which are shaped by legal frameworks)
outline how this information can be shared. The process is also complicated by the fact that
analysis occurs at different points in the process. Monetization similarly happens at two
points: when the credit bureau sells the credit report, and from fees after the bank issues a
new credit card.
Within these ecosystems, we believe that value will tend to accrue to providers of analytics
(Exhibit5). It seems likely to shift away from data collectors and aggregators unless
particular conditions make those roles more valuable (for example, when data are difficult
to collect, proxies are limited, or certain types of data are uniquely necessary for high-
value applications). In the pages that follow, we will look at the generation and collection,
aggregation, and analysis components of the ecosystem.
Within the data ecosystem, value will tend to accrue to firms doing analysis and only under certain conditions to
those providing data generation or aggregation
Data generation
Data aggregation Data analysis
and collection
Description The source and platform Combining data from multiple Manipulating aggregated data
where data is initially sources to develop actionable insights
captured Forming marketplaces for
commercial exchange of data
Factors driving Certain data types will have Demand growth as more Talent shortage
value up higher value if collection applications are developed Deep sector expertise needed
barriers are extremely high Value will be higher if to deliver effectively
or data cannot be legally aggregation is technically Close relationship to actual
shared between parties challenging or requires a use or implementation
neutral third party clarifies value
Factors driving Growth in available proxies Technology advances making Scope could be limited as
value down and expansion of open aggregation easier solutions will be for vertical
access will increase supply applications
Future trajectory
of value
On the supply side, the market for any particular type of data is shaped by difficulty of
collection, access, and availability of substitute data (proxies from which similar insights can
be drawn). The volume of available data has skyrocketed, and this increase in supply drives
down the relative value of any one piece of information. More data is available than ever
before as people spend more time online, as sensors are deployed throughout the physical
environment, and as formerly manual processes (such as toll collection) are automated.
Some kinds of data that were labor-intensive to collect in the past, such as TV watching
habits, are now collected in the normal course of business.
This trend shows no sign of slowing. The norm of digital media, social networking, and
search businesses is to give content away for free and monetize the eyeballs in the form
of advertising, generating user data in the process. The internet amplified a culture of
openness that had already taken root in the tech world, and non-profits (such as Wikimedia
and the Open Data Institute) have emerged to support open data initiatives. The public
sector, always a large-scale data collector, has invested in making its data more easily
available. City governments, for instance, have made significant amounts of information
public, from restaurant health inspection scores to school performance ratings and crime
statistics. Previous MGI research estimated that open data can help unlock $3trillion to
$5trillion in economic value annually across seven sectors.55
55
Open data: Unlocking innovation and performance with liquid information, McKinsey Global Institute,
October 2013.
On the demand side, the market is shaped by ease of use, network effects, and value of the
ultimate uses of data. Thanks to improvements in datas ease of use, there are simply more
users consuming data than ever before. Once only trained data scientists could perform
certain analyses, but now more user-friendly tools make it possible for business analysts
to use large data sets in their work. At the same time, machine learning advances have
expanded uses for unstructured data such as images, driving up the demand for these
types of data. For example, Lose It, a calorie-tracking app, allows users to log their intake
by taking a picture of their food and matching it with an image database.56 The more users
upload and validate that their image matches the suggested item, the more effective the
algorithm grows. This kind of dynamic establishes a virtuous cycle of demand for these
forms of data.
Another factor driving demand is that different types of data are being linked to derive new
insights. A mining company, for example, can use weather data to identify lightning strikes
as a proxy for copper deposits. Linking data leads to more applications and more demand
for data, and we expect the trend to grow as organizations become more sophisticated in
their data and analytics capabilities.
Finally, demand for data is driven by the expected value associated with their ultimate use.
In many cases, organizations are still learning what to do with all the data that is suddenly
available. But as analytics tools improve, so do the odds that at least some value can be
extracted from any given data set.
On balance, the supply and demand factors described above suggest data will continue
to become commoditized. Certain types of data that were expensive to obtain in previous
years are now free, such as $1,400 encyclopedia sets replaced by Wikipedia. Instead
of dialing 411 and paying $1 per query to find a phone number, smartphone users can
use $0.005 worth of data on their device to search for and call any number they need.57
The non-rivalrous nature of data (that is, the fact that it can be used by different parties
simultaneously) may contribute to making it more of a commodity.
But there are some exceptions to the commoditization trend. When access is limited by
physical barriers or collection is expensive, data will hold its value. Google, for example,
made huge investments to collect data for Google Maps, driving more than sevenmillion
miles to gather relevant images and location data.58 These data sources are public, but
the barriers to collection at this scale remain extremely high. The holders of these kinds of
unique, hard-to-capture data sets will continue to have opportunities to directly monetize
data. In these cases, datas non-rivalrous nature will help the provider of this data to capture
more value, since there are opportunities to monetize the data more than once. Some firms
may hold proprietary data that has staying power as the industry standard (as with Nielsen
or the College Board). It is also possible to artificially create scarcity for certain types of data
through legal means, such as licensing to prevent its transfer to other users.
56
Lora Kolodny, Lose It launches Snap It to let users count calories in food photos, TechCrunch, September
29, 2016.
57
$0.005 assumes 0.4 megabytes used to load search page and a monthly data plan of $12.50 per gigabyte.
58
Greg Miller, The huge, unseen operation behind the accuracy of Google Maps, Wired, December 8, 2014.
An important case of value accruing to firms that engage in data generation and collection
involves market-making platforms with network effects. Since suppliers will want to go
where demand is, and consumers will want to go where suppliers are, these platforms have
natural network effects. On these platforms, data beget more data. Social media platforms
also have significant network effects since individuals naturally want to be on platforms
where their friends and others are also present. Search platforms generate data when users
search, which enables their search algorithms to produce higher-quality results, which
draws more users to using the search engine, and on and on. The network effects of such
platforms often lead to a small number of players collecting and owning a large percentage
of data generated in these ecosystems. Proxies may also be sparse as the most valuable
data will be specific platform behavior data that are unique to the users on the platform. In
these circumstances, a few platform leaders will be able to capture significant value from
owning their user behavior data. These network effects can also arise in the aggregation
part of the value chain, which we discuss further below.
In the absence of these types of exceptional supply constraints, simply selling raw data is
likely to generate diminishing returns over time. But in situations where these constraints
exist (or where a business model creates them), generators and collectors of data can
capture significant value.
Aggregation can produce significant value, but it is becoming easier for users to perform
many aspects of this function themselves. There has been robust growth in new software
services for organizing data from different internal and external sources; this niche has
attracted significant venture capital. End-users now have cheaper and more powerful tools
to aggregate data on their own.
Aggregation services are particularly valuable when combining and processing data are
technically difficult, or when coordinating access across diverse sources is a barrier. This
can be a complex process even if the underlying data are commoditized (as with financial
market data) or when they are highly varied and differentiated (as with health records).
Many traditional marketing data providers (such mailing list vendors) and information
services providers (such as Westlaw, Bloomberg, and Argus) fall into this category and have
developed long-standing relationships with data collectors or have technical assets that
enable aggregation. Many of these aggregators also serve as data guides, using their deep
understanding of complex data environments and privacy regulations to advise their clients
on how to best handle the data.
The third-party aggregator model can be relevant in any industry with fragmented
competitors (such as the travel ecosystem, for example). This model has the potential to
create network effects, as individual organizations have incentives to share their data and
join an aggregation service that already has many members and lots of data. The more
members and data in any aggregation service, the more benefit each additional member
gets from joining. This kind of effect often leads to just one aggregator or a small number of
aggregators dominating any given ecosystem.
Another emerging model in aggregation involves data marketplaces that match parties
that want to purchase data with those that can supply it. They can fill a valuable role in
cases where data collectors and end-users are highly fragmented. Marketplaces focus on
creating the technical platform and quality standards needed for a broad range of firms to
participate. They typically aggregate data but do not integrate them; that task is left to the
end-users. The presence of marketplaces lends greater liquidity to the exchange of data,
59
Providers of information services in finance can generate operating margins of more than 25percent.
On the demand side, since analysis is often the last step, the value generated by data and
analysis becomes much clearer. This puts the analytics provider in a better position to
capture a portion of this value. While companies are often uncertain about what to do with a
huge volume of newly collected raw data, they are willing to pay for insights that are directly
related to sales, strategy, and other business functions. Across sectors, firms have a larger
appetite for data as improved analytical tools and capabilities open up new uses. On the
supply side, the highly specialized talent needed for analytics and interpretation is scarce.
Providing data analytics requires industry and functional expertise, and there is a limited
pool of talent and organizations combining these skill sets. Even as tools and platforms
improve, the need to combine analytical and domain expertise will continue to present a
bottleneck, driving up the value of analytics.
The most successful analytics providers combine technical capabilities with industry or
functional expertise. Performing analytics that support predictive maintenance, for example,
requires a deep understanding of how machines work. Using analytics to optimize energy
consumption starts with a thorough understanding of the systems that use energy. This type
of understanding gives these providers the advantage of knowing just how much value they
are creating for their clients, and they can price accordingly. But expertise in one application
in one sector may or may not be immediately transferable to another industry or problem
type; each vertical problem has to be attacked individually. Unlike data aggregation, which is
a horizontal play across different types and sources of data, analytics is a vertical play where
each additional vertical use case requires additional domain knowledge and sometimes
entirely new analytic techniques.
Given the size of the opportunities in analysis, firms in other parts of the ecosystem
often add analytics to claim a piece of this high-value segment of the market. Some data
collectors or aggregators are adding new data product offerings. In insurance, for example,
CoreLogic has used its data assets to develop catastrophic risk management productsin
this case, complex, robust catastrophe risk scores that are sold to insurers. Other data
collectors and aggregators are making similar moves into high-value data products, such as
Wood Mackenzie in energy and Inovalon in health care.
In addition to the data products mentioned above, data collectors and aggregators are
offering to integrate with clients data and perform analysis as a service, especially in
industries such as health care where large-scale analytics is needed but where most firms
lack that core competency. Startups such as SparkBeyond are integrating aggregation
and analytic capabilities to apply machine learning to specific business challenges, with
The rapid evolution of data ecosystems points to the scope of the data analytics revolution
that is beginning to take hold. The shift toward data-guided decision making is transforming
how companies organize, operate, manage talent, and create value. In short, big data
could generate big disruptions. The next chapter looks at six breakthrough models and
capabilities that could set off a wave of change across industries and in society.
This chapter considers how these changes are playing out. But instead of taking an
industry-by-industry view, we believe it is more useful to consider the types of disruptive
models that data and analytics enable. This is by no means an exhaustive list, and the lines
separating some of these models are not clear-cut, as some companies combine these
approaches. Nevertheless, business leaders can apply this lens to the markets in which they
operate with an eye toward preparing for what comes next.
Certain characteristics of a given market open the door to data-driven disruption (Exhibit6).
Data collected in one domain can be deployed, for unrelated purposes, in an entirely
different industry. Using Google search data to create a price index and drawing on credit
scores to inform auto insurance rates are prime examples of how orthogonal data can be
put to work in solving different types of business problems. Collecting new types of data
could set off cascading industry effects.
Exhibit 6
Data and analytics underpin six disruptive models, and certain characteristics make individual domains susceptible
In some domains, supply and demand are matched inefficiently. This may result in the
underutilization of assets, among other problems. Digital platforms that offer large-
scale, real-time matching with dynamic pricing could revolutionize markets with these
DUPLICATE from ES
characteristicsa list that includes areas like transportation, hospitality, certain labor
markets, energy, and even some public infrastructure.
Additionally, any endeavor that can be marred by human error, biases, and fallibility could
be transformed by what is perhaps the most profound capability of data and analytics: the
ability to enhance, support, and even automate human decision making by drawing on vast
amounts of data and evidence. Technology now presents the opportunity to remove human
limitations from many situations and to make faster, more accurate, more consistent, and
more transparent decisions. This capability has wide applicability for businesses in every
industry as well as in many areas of society and daily life.
This chapter describes the new data that is core to each of these models. It shows how
that new data is being used to shake up specific parts of the economyand points to other
areas where disruption could occur in the near future.
But as discussed in Chapter3, data are proliferating. Many new types, from new sources,
can be brought to bear on any problem. In industries where most incumbents have become
used to relying on a certain type of standardized data to make decisions, bringing in fresh
types of data sets to supplement those already in use can change the basis of competition.
New entrants with privileged access to these types of orthogonal data sets can pose a
uniquely powerful challenge to incumbents.
Returning to our property and casualty insurance example above, we see this playing
out. New companies have entered the marketplace with telematics data that provide
insight into driving behavior. This data set is orthogonal to the demographic data that had
previously been used for underwriting. Other domains could be fertile ground for bringing
in orthogonal data from the internet of things. Health care has traditionally relied on medical
histories, medical examinations, and laboratory results, but a new set of orthogonal data
is being generated by consumer health devices such as wearables and connected health
devices in the home (such as blood pressure monitors or insulin pumps). Some innovators
are experimenting to determine if data from these devices, while not clinical grade, could
enhance wellness and health. Connected light fixtures, which sense the presence of people
in a room and have been sold with the promise of reducing energy usage, generate data
exhaust that property managers can use to optimize physical space planning in future
real estate developments. Even in human resources, some organizations have secured
employee buy-in to wear devices that capture data and yield insights into the real social
Orthogonal data will rarely replace the data that are already in use in a domain; it is more
likely that an organization will integrate the orthogonal data with existing data. In the pages
that follow, we will see several examples of orthogonal data being combined with existing
data as part of a disruptive business model or capability.
These platforms have already set off major ripple effects in urban transportation, retail, and
other areas. But that could be only the beginning. They could also transform energy markets
by enabling smart grids to deliver distributed energy from many small producers. And they
could make labor markets more efficient, altering the way employers and workers connect
for both traditional jobs and independent work.
From the outset, these platforms collected data from their user base to implement
improvementsand as the user base grew, they generated even more data that the
operators used to improve their predictive algorithms to offer better service. This feedback
mechanism supported exponential growth. Uber, founded in 2009, is now in more than
See Paul Barter, Cars are parked 95% of the time. Lets check, ReinventingParking.org, February 22, 2013,
61
for a roundup of studies on the utilization rates of cars, including research from economist Donald Shoup.
The changes taking place in urban transportationincluding a substantial hit to the taxi
industrymay be only the first stage of an even bigger wave of disruption caused by mobility
services. These services are beginning to change the calculus of car ownership, particularly
for urban residents. Exhibit7 indicates that almost one-third of new car buyers living in
urban areas in the United States (the segment who travel less than 3,500 miles per year)
would come out ahead in their annual transportation costs by forgoing their purchase and
relying instead on ride-sharing services. For them, the cost of purchasing, maintaining, and
fueling a vehicle is greater than the cost of spending on ride-sharing services as needed.
If we compare car ownership to car sharing instead of ride sharing, around 70percent of
potential car buyers could benefit from forgoing their purchase. A future breakthrough that
incorporates autonomous vehicles into these services, thereby reducing their operating
costs, could increase this share to 90percent of potential car buyers in urban settings.
Exhibit 7
SOURCE: McKinsey Automotive 2030 report, Urban mobility survey; McKinsey Global Institute analysis
62
Fitz Tepper, Uber has completed 2billion rides, TechCrunch, July 18, 2016.
63
Johana Bhuiyan, Lyft hit a record of 14million rides last month with run-rate revenue of as much as
$500million, ReCode, August 2, 2016.
64
Steven Millward, Chinas Didi Chuxing now sees 10million daily rides, Tech in Asia, March 22, 2016.
65
Shared mobility on the road of the future, Morgan Stanley Research, www.morganstanley.com/ideas/car-of-
future-is-autonomous-electric-shared-mobility.
Exhibit 8
Hyperscale real-time matching in transportation could potentially create some $850 billion to $2.5 trillion in
economic impact
Fewer vehicle
3301,000
purchases
Reduced parking
330990
cost
Reduced number
50160
of accidents
Public
benefits
Reduced pollution
510
from parking
Total1 8402,530
1 Roughly 60 percent of the economic impact occurs in developed economies, and the remainder in emerging economies.
NOTE: Assumes a 1030% adoption rate among low-mileage urban travelers. Numbers may not sum due to rounding.
Up to In addition to direct car ownership costs, the shift toward mobility services will generate
$1T
significant savings in related costs like parking. Consumers are projected to spend
$3.3trillion on parking services in 2025, but the use of mobility services could allow them to
save $330billion to $990billion. Around $220billion to $650billion of this could be realized
potential consumer
in developed countries and $110billion to $340billion in developing countries.
savings from
adopting mobility There is an additional benefit from the reduced demand for driving and parking. If 15 to
services rather 30percent of drivers on the road in cities are looking for parking, this is a major logistical
than purchasing challenge in dense urban cores. By boosting the utilization of each vehicle, mobility services
vehicles can decrease demand for parking and help reduce congestion, which creates further
positive ripple effects on mobility and time saved. The reduced search for parking can
generate a time-saving effect due to reduced congestion that can be valued at $10billion to
$20billion as well as fuel savings in the range of an additional $20billion to $60billion.
Meanwhile, the shift to mobility services can improve productivity. Each day, workers spend
50 minutes in driving commutes on average in both developed and developing countries.67
If even half of that time can be used more productively for work, mobility services could
generate an additional $100billion to $290billion in potential benefit.
Finally, ride sharing can improve road safety by creating a more viable option that keeps
people from getting behind the wheel when they have been drinking, they are excessively
tired, or they have other impairments (such as difficulties with night vision). Traffic accidents
result in about 1.25million deaths globally per year, withmillions more sustaining serious
injuries.68 One study found that ride-sharing services have reduced accidents by an
average of 6percent.69 Another found a 4 to 6percent reduction specifically in drunk driving
fatalities.70 We estimate that reduced accident rates due to the expansion of digital mobility
services could save $50billion to $160billion in economic termsnot to mention the
incalculable value of reducing the human toll of accidents.
Beyond their effect on traditional taxi services, mobility services could have wider impact.
Automakers are the biggest question mark as the calculus of car ownership changes,
particularly for urban residents. While sales will likely continue to grow in absolute numbers,
the shift toward mobility services could potentially halve the growth rate of global vehicle
sales by 2030 (Exhibit9).71 In response, car manufacturers will likely need to diversify and
lessen their reliance on traditional car sales. Many appear to be preparing for this future;
partnerships are forming between traditional automakers and mobility service providers
or other high-tech firms. Toyota, for example, recently invested an undisclosed amount
67
Car data: Paving the way to value-creating mobility, McKinsey Advanced Industries Practice, March 2016.
68
World Health Organization, 2013 data.
69
Angela K. Dills and Sean E. Mulholland, Ride-sharing, fatal crashes, and crime, Providence College working
paper, May 2016.
70
Brad N. Greenwood and Sunil Wattal, Show me the way to go home: An empirical investigation of ride sharing
and alcohol related motor vehicle homicide, Temple University, Fox School of Business research paper
number 15-054, January 2015.
71
This draws on assumptions and similar findings from previous research by McKinsey & Company. See
Automotive revolutionperspective towards 2030, McKinsey Advanced Industries Practice, January 2016.
Exhibit 9
Increasing adoption of mobility services could lower the trajectory of global vehicles sales,
potentially cutting the growth rate in half
SOURCE: McKinsey Automotive 2030 report; Autos & shared mobility report, Morgan Stanley; McKinsey Global Institute analysis
Autonomous vehicles, which appear to be on the horizon, could accelerate this wave of
change. When self-driving cars are added into the equation, supply and demand matching
could improve even further since these vehicles can have higher utilization rates. Car pooling
may increase, and the cost of urban transportation could plummet. On the flip side, the
demand for car purchases could fall further, and many people who make a living as drivers
(nearly twomillion in the United States alone with the majority being truck drivers) could
be displaced.
The role of data and analytics in transportation is not limited to urban centers. It can also
improve the efficiency of trucking routes and handoffs in the logistics industry. Rivigo has
applied mapping technology and algorithms to improve logistics efficiency in parts of India.
72
Douglas Macmillan, Toyota and Uber reach investment, lease partnership, The Wall Street Journal, May
25, 2016.
In energy markets, for instance, demand can fluctuate dramatically and frequently by time
and by region. The current energy grid is ill-equipped to smooth out the spikes in peak
demand with excess off-peak supply. But wider deployment of smart grid technology can
address this inefficiency by using new sensor data to generate more dynamic matching of
supply and demand, in part by allowing small, private energy producers (even individual
homeowners) to sell excess capacity back to the grid. This technology is developing quickly:
the United States alone has committed more than $9billion in public and private funds
toward smart grid technology since 2010.73 In the Netherlands, some startups are using the
peer-to-peer model to match individual households directly with small providers (such as
farmers) who produce excess energy. Vandebron, for instance, charges a fixed subscription
fee to connect consumers with renewable energy providers; in 2016, this service provided
electricity to about 80,000 Dutch households.
The markets for certain types of short-term labor services are also being redefined. Driving
passengers is only one of the many types of services now being offered through digital
marketplaces. Others include household chores and errands, data entry, and simple coding
projects. Conventional platforms (even digital job boards such as Craigslist) allow for static
requests. But now platforms can match available workers with requests for their services
on demand.
TaskRabbit, for example, serves 18 US cities, matching more than 70percent of task
requests with a local provider within five minutes. There are more than 30,000 taskers
globally, and the average worker who participates takes on two or three jobs a day, five
days a week.74 Recent research from MGI has found that already some 15percent of
independent workers in the United States and Europe have used digital platforms to earn
income.75 The non-profit Samasource is seeking to bridge this market gap by breaking
down larger digital projects into smaller discrete tasks that can be handled by remote
workers in developing countries. As of 2016, almost 8,000 workers participated on this
platform, increasing their earnings by more than three-and-a-half times.76
Platforms such as TaskRabbit and Samasource quickly match underutilized supply (people
looking for work) with demand. This can have productivity benefits for businesses, while
creating a new avenue for individuals who need work to generate income. Previous MGI
research found that some 850million people across seven major economies alone are
unemployed, inactive, or working only part time. Previously they had few options for rejoining
the workforce or increasing their hours, but these types of platforms increase the range of
flexible options available to them.77
73
2014 smart grid system report, US Department of Energy, August 2014.
74
Company website.
75
Independent work: Choice, necessity, and the gig economy, McKinsey Global Institute, October 2016.
76
Company website.
77
A labor market that works: Connecting talent with opportunity in the digital age, McKinsey Global Institute,
June 2015; and Independent work: Choice, necessity, and the gig economy, McKinsey Global Institute,
October 2016.
Exceedingly detailed data now enable finer levels of distinctions among individuals, a
capability that paves the way to precise micro-targeting. Behavioral data gathered from
diverse sources such as online activity, social media commentary, and wearables can yield
personal preferences and insights. Broad data sets spanning large numbers of users or
customers can continuously improve the experience. Amazon, for instance, uses algorithms
to compare its interactions with one individual with results across large consumer data sets,
generating targeted product recommendations from across its marketplace.
Exhibit 10
Radical personalization will be disruptive in areas where tailoring offerings to personal preferences and
characteristics is highly valued
Today there is a growing push to integrate these new capabilities and vast patient data sets
from electronic medical records into the delivery of care. Data and analytics may finally
be ready to take off given the rapid growth in the volume of available data, the increasing
recognition among all stakeholders that there is value in making better use of patient data,
and changing incentives.
Radical personalization is partly about using granular, individual data to identify tailored
treatments based on a patients specific biomarkers, genetics, and behaviors. But it can
also be put to broader use in transforming the system. In many countries around the
world, and especially in the United States, a lack of information transparency and poorly
aligned incentives create dysfunction. Providers are reimbursed for filling hospital beds and
running procedures, and individuals lack the information to shop around and be informed
consumers. Most patients enter the health system only when they have a disease. Care is
focused in high-cost settings and not optimized for the patient experience or for value, in
large part because the data that could be used to measure and monitor outcomes is not
available to the parties that need it. Making better data available could help patients be more
aware of their risk factors and take charge of their own health. Insurers can learn more about
their customers and provide incentives for preventive measures. Hospitals and provider
groups can be rewarded for outcomes rather than paid by the procedure, creating a system
that steers patients toward the best place and time for intervention and connecting them
with the right specialists. Using data to change incentives could have a huge impact in terms
of both dollars saved and patient health.
Elizabeth Stawicki, Talking scales and telemedicine: ACO tools to keep patients out of the hospital, Kaiser
78
Radical personalization has the potential to transform how health care is delivered
Patients only enter the Patient case is focused in Physicians follow clinical
1 health system when they
have a disease
2 high-cost settings and not
optimized for value
3 guidelines for all patients
with the same disease
Future state: Continuous monitoring and personalized treatment of patients at best place and time for intervention
1 Right living
2
Right setting
for value
Intervention at
the right setting
to maximize
value of care
3
Right care
Tailored
treatments
based on
individual
markers
Tangible
results
59% Up to 1 year $200
lower national health expenditures increase in health per person increase in productivity
and life expectancy
Providers: To deliver truly personalized care, health-care networks and other providers
will need to integrate data across EMR systems to get a complete view of a patient. It will
take a vast data set of patient records to build smart clinical decision support tools. A
great challenge for these systems and for the practice of medicine generally will be how
to manage this constant flood of information and incorporate it into care. A doctor today
sees a patient with asthma; by contrast, tomorrows doctor will see an asthmatic patient
who works outdoors, exercises daily, has certain genetic markers, and shows elevated
expression of a few proteins. Physicians and regulators will need to consider carefully
how to utilize such real-world evidence, which can enable them to put a greater focus on
prevention and wellness. Realizing this potential may require a realignment of financial
incentives away from fee-for-service reimbursement and toward a value-based model
that emphasizes outcomes and prevention.
Payers: Payers can use data and analytics to promote increased price transparency
throughout the system. New partnerships among payers, providers, and pharmaceutical
companies and pay-for-performance models may set the stage for this shift. Innovative
partnerships, such as the one between Inova Health System and Aetna, have used data
sharing in a value-based reimbursement model to provide a more integrated patient
experience. Payers may become more involved in care management or encouraging
their providers to do so. Adoption of these models has been slow, but the increasingly
data-rich environment will enable better determination of which treatments are truly
effective for certain patient profiles and at what cost. This can be beneficial not only in the
US context but also in nationalized health-care systems.
Pharmaceutical and medical device companies: On the R&D side, big data and
advanced analytics can make predictive modeling of biological processes and drugs
much more sophisticated, accelerating drug development. Scientific understanding of
molecular biological processes will expand rapidly with a huge database of knowledge,
and pharmaceutical companies can use genomic and proteomic data combined
withmillions of records on patient outcomes to design better therapies. Instead of aiming
for the next blockbuster, pharma companies will be challenged to shift their business
models to deliver tailored treatments for smaller, targeted patient populations. While
oncology today is the clear focus for personalization, other treatment areas will follow as
the right information becomes available.
First, providers can use IoT technology and analytics to monitor patients remotely, timing
interventions and making adjustments before a crisis occurs. This could be transformative
in treating chronic conditions such as diabetes and cardiovascular and respiratory
diseases and ensuring that patients are following recommended regimes. The use of these
monitoring technologies dramatically reduces costs for the patient. They can also be used to
change incentive structures. New business models can use these technologies, combined
with other behavioral health interventions, to create a new focus on prevention, disease
management, and wellnessaddressing health before a person becomes a patient. For
example, Discover Health, an insurer based in South Africa, tracks its consumers food
purchases and fitness behaviors, offering rewards and incentives for healthy behavior.79
Second, patients can be promptly directed to the right setting to receive diagnosis and care,
shifting them away from higher-cost and sometimes higher-risk settings when possible.
Analytics can enable proactive risk assessment to anticipate complications and avoid
hospital stays, reducing overall health-care expenditures. Additional savings can come from
providing patients, physicians, and insurers with more information on pricing and qualitya
particular issue in the United States, where costs can vary sharply across providers for
the same procedure. Evidation Health, for example, provides digital tools to stakeholders
across the system that can aggregate data to determine the efficacy and cost-effectiveness
of various interventions. Our analysis looked solely at the pricing transparency that will
drive individual decisions about care, but the potential for broader population-based value
assessments could be enormous. The National Institute for Health and Care Excellence
(NICE) in the United Kingdom evaluates all medical interventions and treatments through a
quality-adjusted life-year lens to determine cost-effectiveness, and many other systems are
looking to make similar changes.
Finally, the last element of personalized medicine is identifying the right treatment for each
patient. Clinical decision support systems, powered by artificial intelligence, will be able
to comb throughmillions of patient records, genomic sequences, and other health and
behavioral data to identify courses of treatment that are most effective for a particular
individual with certain characteristics. Combining these insights with the costs of care could
identify the most cost-effective treatment, while avoiding treatments that are ineffective
for a given individual. This could maximize the efficacy of drugs, surgeries, and other
interventions while reducing waste and harmful side effects. Researchers at University
College London, for example, use supercomputer simulations to determine the best
treatment for specific breast cancer mutations among 50 drug choices.80 The integration
of large patient data sets will enable the identification of new insights in patient care for very
specific health and disease states.
79
Sustainable development report, Discovery Health, 2015; Adrian Gore, How Discovery keeps innovating,
McKinsey Quarterly, May 2015.
80
SuperMUC enables major finding in personalized medicine, HPC Wire, June 20, 2016, available at www.
hpcwire.com/off-the-wire/supermuc-enables-major-finding-personalized-medicine/.
The other major benefits are cost reductions. In the United States, where health-care
spending is 18percent of GDP, we estimate more than $600 of annual savings per person,
which equates to 1 to 2percent of GDP. Across other high-income countries, these savings
would be closer to $200 per person, which translates to 0.5 to 1percent of GDP.
Exhibit 12
Precision health care could drive better health outcomes, improve productivity and quality of life, and lower costs
Economic impact
Levers $ billion
Impact
Remote monitoring of
600-
patients with chronic Increase in life
5,500
Preventive illnesses to avoid crises expectancy
care Years
Wellness programs and 300 Global:
incentives 1,800 +0.21.3
High-income countries:
Proactive risk assessment +0.52.0
200- Middle- and low-
to anticipate complications
The right 400 income countries:
and reduce hospital stays
setting for +0.11.1
care Directing patients away
50 Global reduction in
from higher-cost, higher-
200 national health
risk settings when possible
expenditures
Identifying the right 59% of NHE2
700-
treatment for each
2,000 Health-care cost
individual
Better care savings per person
Eliminate treatments that United States:
200-
are ineffective for each Over $600
300
individual Non-US high income:
Over $240
2,000-
Total1
10,000
1 Roughly 60% of the impact occurs in developed economies, with the remainder in emerging economies.
2 National health expenditures.
NOTE: Numbers may not sum due to rounding.
Other industries where radical personalization could take hold include retail, labor markets,
travel and hospitality, media, and advertising. In travel and leisure, customized experiences
can be created, with recommendations for everything from restaurant menu items to tour
guides. Google recently launched a travel app that will offer personalized recommendations,
while WayBlazer, powered by IBM Watson, helps customers customize their trips based on
their preferences. The media landscape is changing as Facebooks news feed algorithm
determines which news articles will interest users based on a combination of personal and
behavioral data.
Technologies such as data lakes are a promising concept for overcoming these issues.
These types of new tools can simplify access across the enterprise by integrating all types
of data into one easily accessible and flexible repository, and they are also useful for storing
the vast real-time information flows coming from IoT sensors. Data lakes can integrate
Danielle Douglas-Gabriel, Colleges are using big data to identify when students are likely to flame out,
81
Massive data integration capabilities are valuable for industries with accessibility challenges
or those in which data discovery has direct relevance for creating value. Two conditions
in particular set the stage. First, industries are ripe if data silos are a longstanding issue
that is, if many companies have internal data residing in separate business units without
adequate access by others. In these cases, multiple, redundant data sources blur the
bigger picture, and different departments may reach varying conclusions about the same
phenomena. Second, massive data integration capabilities are valuable in cases where
combining unstructured and structured data from multiple sources (including new external
sources) enables better decision making. A consumer product company, for example,
can add comments about its brands posted on social media platforms to its existing
sales reports and demographic analyses. Data lakes are one such tool for integrating new
sources of data as they become availableand since they can store both structured and
unstructured data, they can tap an enormous range of potential new sources.
Retail banking is ripe for massive data integration to change the nature
of competition
Retail banking has always been data-rich, with stores of information on customers
transactions, financial status, and demographics. But few institutions have fully taken
advantage of this data due to internal barriers that limit access across the organization,
the variable quality of the data, and sometimes even an inability to see the value that data
could generate.
Surmounting these barriers is becoming critical now that a plethora of new data sources
can be added to existing transaction records. These include social media posts, call center
discussions, video footage from branches, and data acquired from external sources and
partners. In addition, retail banks can partner with mobile phone operators and retailers to
complement their view of each customer.
Adding analytics on top of all the data can enhance the customer experience, enabling
banks to retain existing customers and attract new ones. Customers increasingly expect
a personalized experience across all channels. They also want banking services to be
available on their preferred platforms; a growing number are making payments directly via
messaging apps, for instance. Integration tools such as data lakes help to provide a holistic
view of each customer by combining different sociodemographic and behavioral data. One
of the important applications for banks is the ability to improve risk and credit assessments
by drawing on data from multiple sources for more accurate predictions.
Massive data integration repositories and the analytics they enable can also optimize
banking operations and increase revenue (Exhibit13). This is critical in an era when low
interest rates are putting pressure on margins, regulatory and reporting requirements are
growing more complex, and new digital native startups are introducing innovative business
models into financial services.
Retail banks have opportunity to break their data silos, combining traditional and new data sources in data lakes
Internal External
Data source
to banks to banks
Unstructured
data Call Whole- Utilities
Bank center saler customer
website customer customer payment
data notes history record
Tele-
Video analysis of
communications
customer footage
customer patterns
A data-driven transformation of the retail banking industry could generate roughly $400 billion to $600 billion in
economic impact
Levers
Cross-sell/upsell products
Data and Better customization of products
62 48 110 Improved risk assessment and
analytics
underwriting
Virtual training
New customer channels
Digital 221 96 317
Digital distribution channels
Cloud-based architecture
NOTE: Revenue- and cost-saving levers are separated into three categories based on the size of the role analytics will play. See appendix for levers. Numbers
may not sum due to rounding.
SOURCE: McKinsey Digital and Financial Services Practice; McKinsey Panorama global banking database; McKinsey Advanced Analytics Practice; McKinsey
Global Institute analysis
As the retail banking industry becomes more data-driven, it is likely to take on an entirely
new look. Talent capable of crunching the numbers, developing the algorithms, and
combining those skills with an understanding of financial services and the regulatory
landscape will be critical. As more banking services become digital, physical branches
could be phased out, leading to substantial cost savings.
Three main types of ecosystem players could emerge. First are the solution and analytics
innovators. This category includes many of the fintech players. These digital natives have
deep capabilities, and they tend to focus on a particular market niche, with no intention of
expanding into full retail banking offerings. This frees them from the constraints of being fully
regulated banks. Betterment, for example, offers wealth management by robo-advisers,
while ZestFinance combines vast amounts of data and machine learning to improve
credit scoring.
Second are incumbent institutions that are the natural owners of significant amounts of data
describing the financial position and behavior of their customers. This proprietary data gives
them a significant advantage, but they could lose it if they lose the source of the data (that
is, their customers) or if other players come up with ways to create equivalent or superior
information by integrating from different sources.
Third, companies in other sectors can become part of the banking ecosystem if they bring
in orthogonal datasuch as non-financial data that provides a more comprehensive and
granular view of the customer. These players may have large customer bases and advanced
analytics capabilities created for their core businesses, and they can use these advantages
to make rapid moves across sector boundaries, adding financial services to their business
The move toward massive data integration and analytics may create room for new players
and partnerships. Acquisitions and collaborations may be necessary for traditional retail
banks to acquire the talent and capabilities they need to stay relevant. Santander UK, for
example, has launched a partnership with Kabbage, a small business loan provider that
uses machine learning to automate credit decisions. Data-sharing partnerships between
banks and telecom companies have helped both sides develop new insights (especially
around fraud, customer creditworthiness, and microsegments of customers) that otherwise
would not have been possible.
Analytics-driven business models are emerging, such as Mint, an aggregator that utilizes
data to create personal finance recommendations. Another new model involves peer-to-
peer platforms. SoFi, for example, uses robust risk scoring through creative data sources
and analytics to connect borrowers and investors directly; the company reports that it
already has more than $6billion in funded loans.82
In the public sector, for example, tools such as data lakes could break the silos that exist
across various agencies and levels of government, although this will require rigorous data
management and opening public data sets. It is often the case that systems and data are
owned by different departments and functions, on a range of platforms and with differing
taxonomies and access requirements. Fragmentation and the absence of a central owner
for nationwide IT infrastructure and common components can make it hard to connect
the internal plumbing.83 But massive data integration could create a more seamless
experience for the end-user, whether that user is a government worker, a business, a
citizen, or another intergovernmental office. Cities, for example, could link health and
education data into planning their social welfare programs, while law enforcement could
connect to external sources of data such as social media and weather conditions for better
situational awareness.
Insurance is another industry in which incorporating more data from multiple sources and
applying cutting-edge analytics can create value. The industry can rely on a wider array of
information to assess risk more accurately. Companies can reduce fraud, improve pricing,
and improve cross-selling by determining when people are going through a life change that
may affect their product needs. Data mining also helps in creation of new products.
Manufacturing and processing industries can also benefit from storing all of their reports
and logs in a single repository where patterns can be extracted when enough historical data
are collected. This can be an especially valuable approach for collecting and analyzing the
vast information flows generated by IoT sensors.
82
SoFi surpasses $6billion in funded loans, bolsters leadership team, company press release, December
17, 2015.
83
Cem Dilmegani, Bengi Korkmaz, and Martin Lundqvist, Public-sector digitization: The trillion-dollar
challenge, McKinsey.com, December 2014.
In the realm of process innovation, data and analytics are helping organizations determine
how to structure teams, resources, and workflows. High-performing teams can be many
times more productive than low-performing teams. Understanding this variance and how
to build more effective collaboration is a huge opportunity for organizations. This involves
looking at issues such as complementarity of skills, the optimal team size, whether teams
need to work together in person, what past experience or training is important, and even
how their personalities may mesh. Data and analytics can be used to generate new
hypotheses through finding new patterns that may not have even occurred to managers.
Vast amounts of email, calendar, locational, and other data are available to understand
how people work together and communicate, and all of these data can lead to new insights
about improving performance.
In product innovation, data and analytics can transform research and development in areas
such as materials science, synthetic biology, and life sciences. Leading pharmaceutical
companies are using data and analytics to aid with drug discovery. Data from a variety
of sources could help to suggest the chemical compounds that could serve as effective
drug treatments for a variety of diseases. Furthermore, with huge amounts of data to sort
through and nearly infinite possible combinations of features, deep learning techniques help
to narrow the universe of possible combinations in a smart way, leading to discoveries.
One team of scientists at Carnegie Mellon University used data and machine learning to
predict the results of experiments without actually having to perform them, thus reducing the
number of tests by 70percent.84 In another example, AstraZeneca and Human Longevity
are partnering to build a database of onemillion genomic and health records along with
500,000 DNA samples from clinical trials. The associations and patterns that can be
gleaned from those data could prove to be immensely valuable in advancing scientific and
drug development breakthroughs.85
Companies are also using data and analytics to improve the online user experience. They
can experiment with variables such as the optimal design and placement of content on a
web page, smart recommendations, user journeys, and various types of A/B testing. Deep
learning is also getting better at creating art, logos, and other design content that could
affect creative fields.
Applying data and analytics to product, process, and design innovation is very much in the
early stagesbut some organizations are pushing the boundaries in these applications.
If they are able to develop these capabilities to differentiate themselves, they could move
further ahead of the competition.
84
Jake Hissitt, Machine learning for discovery: How machine learning algorithms are revolutionizing pharma,
Innovation Enterprise, April 22, 2016; and Chris Wood, Machine-learning robot could streamline drug
development, New Atlas, February 10, 2016.
85
Human Longevity Inc. announces 10-year deal with AstraZeneca to sequence and analyze patient samples
from AstraZeneca clinical trials, company press release, April 2016.
Faulty assumptions and biases lead many hiring managers to screen out non-traditional
job seekers who might have tremendous natural ability. In other types of decisions,
people tend to give weight to data points that back up their original hypothesis rather than
fully considering those that contradict their working theory (a phenomenon known as
confirmation bias). Or they may prioritize data points that support continuing the status quo
or avoiding risk (stability bias). The use of data can obviate some of these biases, making the
decision-making process more transparent and evidence-based.
Exhibit 15
Smart A more effective city planning and zoning process that better
cities captures the needs of the community
Preconditions Insurance Increased behavioral data from sensors and other sources leads
Human biases and and risk to better policy pricing and risk assessment
heuristics are predominant
Health Algorithms can prevent human errors in health care, avoiding
in decision making
care misdiagnoses, drug interactions, and incorrect dosages
Human error and physical
limitations lead to mistakes Criminal Smart algorithms can improve sentencing and parole decisions,
and lost value justice while predictive policing can better direct law enforcement to crime
Labor Job seekers and employers will have greater transparency that
markets enables them to make better matches
Other human limitations in decision making can also be overcome through the use of data
and analytics. Today there is a flood of data from new sources, such as IoT sensors, digital
games, and social media images. While humans can reach information overload, automated
algorithms can weigh a vast amount of data. Automated algorithms and decision-support
tools can help to avoid errors or lapses that have serious consequences. Self-driving cars,
for example, can continuously monitor their surroundings without falling prey to fatigue or
distraction, which can cause human drivers to have accidents.
Below we look at example domains in which these dimensions of decision making are
crucial and consider how analytics capabilities could be transformative. In many cases,
this could create societal benefits in addition to economic value. However, it is also
important to consider the fact that enhanced decision-making capabilities could lead to job
displacement. Algorithms may allow machines to replace humans in many contextsand
as autonomous robots gain the ability to move and react in the physical world, the number of
settings where human labor could conceivably be replaced will expand. But algorithms will
not replace people everywhere they are used; in many cases, they are best used to support
and complement human judgment rather than to substitute for it.
10-20%
Automating complex decision making could enable smart cities to become more efficient
and better able to respond to changing environments in real time. Transportation and utilities
in particular are two areas of urban management in which rapid decision making is crucial.
potential reduction
in traffic congestion Smart transportation systems utilizing the IoT and automated algorithms can enable more
seamless traffic flows, reduce waits on public transportation systems, and make parking
availability fully transparent to prevent circling and congestion. Some cities have begun to
deploy technologies that can produce these benefits. In Singapore, sensor data is used
to predict traffic congestion in real time and adjust tolls to limit jams. In Copenhagen, road
sensors detect approaching cyclists and turn signals green.
We estimated the size of the economic potential associated with a selection of analytics
applications in smart cities around the world.88 In transportation management, we find that
a number of uses could produce $200billion to $500billion in economic impact through
reducing traffic congestion by about 10 to 20percent. These include centralized and
adaptive traffic control, smart parking meters and pricing, schedule management of buses
and trains, congestion pricing in express lanes on roads, and monitoring public transit for
maintenance needs.
Using analytics to make utilities more responsive in real time could create global impact
of some $20billion to $50billion. This could come about through a combination of
structural monitoring (for example, installing sensors on streetlights and bridges), smart
meters for energy and water demand management, and automating power distribution
and substations. These initiatives could decrease water consumption by up to 5percent,
produce cost savings of up to 20percent in maintenance and labor, and reduce outages by
more than 35percent.
These capabilities have valuable applications for insurers. Over time, insurance companies
have sought ways to collect additional data that they believe could have predictive power
and today they finally have the analytics ability to put that information to work in revolutionary
ways. For example, behavioral data from sensors embedded in cars can provide insurers
much more direct information about an individuals driving patterns and risk than his or
her educational attainment, age, or make of car. Some insurers incorporate credit scores
because of empirical evidence that people who pay their bills on time are also better drivers.
Companies offering property insurance are using sensors on water pipes or in the kitchen to
identify predictors of pipe leaks or breaks, flooding, or fires, and to warn the homeowners.
Similarly, behavioral data can transform life insurance models. Insurers can use these
data to better price and customize coverage optionsor even to warn customers to take
preventive actions. The benefits of data and analytics should accrue to both insurers
(in the form of reduced claims) and individuals (in the form of loss prevention). One UK
insurance company that used vehicle sensors reported that better driving habits resulted
in a 30percent reduction in the number of claims, while another reported a 53percent
drop in risky driving behavior. Extrapolating that type of result across all auto insurers yields
potential impact on the order of $40billion. Summing up the benefits across other insurance
industries and lives saved would indicate economic benefits in the hundreds of billions of
dollars globally.
86
Adrian Booth, Niko Mohr, and Peter Peters, The digital utility: New opportunities and challenges, McKinsey.
com, May 2016.
87
Joshua Masinde, Kenya Power counts on smart meters to cut costs and deal with tampering, Daily Nation,
July 23, 2015.
88
The internet of things: Mapping the value beyond the hype, McKinsey Global Institute, June 2015.
The impact of reducing medical errors could be huge. It would encompass not only
the direct additional costs to the medical system to correct the errors but also the lost
productivity of patientsand most important, it could save lives. One landmark study
in 2000 estimated that anywhere from 44,000 to 98,000 Americans die each year as a
result of preventable medical errors.91 Later research building on that study and applying
a quality-adjusted life-year analysis estimated that the associated costs could be as high
as $100billion annually in the United States alone.92 Assuming that analytics could reduce
such errors by approximately half, the total impact in high-income countries could approach
$200billion to $300billion.
New analytics tools that incorporate online tests and games can give employers data
markers indicating whether candidates have the qualities and skills that predict high
performance and retention.93 Many new players are developing solutions that can support
smarter and more data-driven hiring processes. HireIQ, for example, provides software to
digitize the interview process and apply predictive analytics to the results. Joberate enables
companies to see real-time employment trends and competitive intelligence, while Knack
uses gamification to test for skills and help match workers and jobs. But human resources
departments will need an infusion of analytical talent to be better equipped to manage in a
more data-driven world.
89
Martin A. Makary and Michael Daniel, Medical error: The third leading cause of death in the US, BMJ,
May 2016.
90
Medication errors, Patient safety primer, Agency for Healthcare Research and Quality, US Department of
Health and Human Services, available at https://psnet.ahrq.gov/primers/primer/23/medication-errors.
91
Linda T. Kohn, Janet M. Corrigan, and Molla S. Donaldson, eds., To err is human: Building a safer health
system, Committee on Quality of Health Care in America, Institute of Medicine, National Academy
Press, 2000.
92
C. Andel, S. L. Davidow, M. Hollander, and D. A. Moreno, The economics of health care quality and medical
errors, Journal of Health Care Finance, volume 39, number 1, fall 2012.
93
Susan Lund, James Manyika, and Kelsey Robinson, Managing talent in a digital age, McKinsey Quarterly,
March 2016.
Beyond altering hiring and job hunting, this new transparency could help educational
institutions better prepare students for the job market. Education providers can become
more responsive to the demand for skills in the labor market. Universities can adjust their
curricula to respond to those changes, while secondary schools could incorporate job
preparation and discernment driven by data. MGIs previous research estimated that across
seven major economies alone, some $89billion of annual spending on tertiary education
is rendered ineffective because students pursue paths that leave them unemployed or
underemployed.95 Meanwhile, earlier assessments could allow for more personalized and
predictive educational planning that incorporates a return on investment analysis.
We sized the impact that these changes could have on the improved productivity of workers
and the operational cost savings for an organization.96 Higher productivity can come
from better matching, training, and maximizing of talent, which could lead to $500billion
to $1trillion in value worldwide. Meanwhile, firms can save money by reducing attrition
and as well as creating time savings on recruiting. Such effects could total $100billion to
$150billion in cost savings worldwide.
Data and analytics are already shaking up multiple industries, and the effects will only
become more pronounced in the years ahead as more players adopt and combine these
models to affect even more industries. But the capabilities described in this chapter are only
the tip of the iceberg. An even bigger wave of change is looming on the horizon as deep
learning reaches maturity, giving machines unprecedented capabilities to think, problem-
solve, and understand language. The following chapter looks at these groundbreaking
developments and considers the enormous opportunities and risks they could pose.
94
A labor market that works: Connecting talent with opportunity in the digital age, McKinsey Global Institute,
June 2015.
95
Ibid.
96
See A labor market that works for more details on our assumptions regarding the impact of data and analytics
in a number of human resources functions for a representative company in a few industries.
What was once science fiction is now rapidly advancing scienceand will soon be a core
business capability. Systems enabled by machine learning can provide customer service,
manage logistics, analyze medical records, or even write a news story. Deep learning, a
frontier area of research within machine learning, uses neural networks with many layers
(hence the label deep) to push the boundaries of machine capabilities. Data scientists
working in this field have recently made breakthroughs that enable machines to recognize
objects and faces, to beat humans in challenging games such as chess and Go, to read lips,
and even to generate natural language. Digital giants such as Google, Facebook, Intel, and
Baidu as well as industrial companies such as GE are leading the way in these innovations,
seeing machine learning as fundamental to their core business and strategy.
The potential uses of machine learning are remarkably broad. The value potential is
everywhere, even extending into sectors that have been slow to apply data and analytics. As
applications of this technology are adopted, they could generate tremendous productivity
gains and an improved quality of life. But they could also unleash job losses and other
disruptions, not to mention thorny ethical and societal questions that will have to be
addressed as machines gain greater intellectual capabilities.
To give an idea of how this transformation could play out, we begin by exploring the potential
impact of machine learning through two lenses. First, we investigate which business
problems across 12 industries could be solved by machine learning. Second, we examine
which work activities currently performed by people could potentially be automated through
machine learning and how that could play out across occupations.
We caution that this is an initial and broad exploration rather than a deep investigation into
specific industries and use cases. Furthermore, this is not meant to be a comprehensive list
of every possible application of machine learning. It could be harnessed to tackle problems
beyond the boundaries of the industries we analyzed, such as climate change. It could also
find a role in daily social interactions that have nothing to do with commerce. The findings
here are meant to set the stage for many avenues of interesting future research. All that
being said, one of the most striking of our initial findings is the incredibly wide applicability of
machine learning techniques to many industry problems and individual occupations.
Reinforcement learning is another machine learning technique used to identify the best
actions to take now in order to reach some future goal. These type of problems are common
in games and can be useful for solving dynamic optimization and control theory problems
exactly the type of issues that come up in modeling any complex system in fields such
as engineering and economics. Reinforcement learning algorithms that use deep neural
networks (deep reinforcement learning) have made breakthroughs in mastering games
such as chess and Go.
All machine learning algorithms require large amounts of training data (experiences) in
order to learn. They recognize patterns in the training data to develop a model of the
world being described by the data. Reinforcement learning is slightly different than other
techniques in that the training data is not given to the algorithm, but rather, is generated in
real time via interactions with and feedback from the environment. But in all cases, as new
training data comes in, the algorithm is able to improve and refine the model. This process
is especially suited for solving three broad categories of problems: classification, prediction/
estimation, and generation (Exhibit16).
First, tackling classification problems involves making observations about the world, such
as identifying objects in images and video or recognizing text and audio. Classification also
involves finding associations in data or segmenting data into clusters due to associations.
(Customer segmentation is a classic example of this.) Second, machine learning can also be
used for predictions such as estimating the likelihood of events and forecasting outcomes.
Lastly, machine learning can be used to generate content, from interpolating missing data to
generating the next frame in a video sequence.
Machine learning can help solve classification, prediction, and generation problems
Classification Classify/label visual objects Identify objects, faces in images and video
Classify/label writing and text Identify letters, symbols, words in writing sample
Cluster, group other data Segment objects (e.g., customers, product features) into
categories, clusters
Discover associations Identify that people who watch certain TV shows also read certain
books
Prediction Predict probability of outcomes Predict the probability that a customer will choose another provider
Generation Generate visual objects Trained on a set of artists paintings, generate a new painting in the
same style
Generate writing and text Trained on a historical text, fill in missing parts of a single page
Generate other data Trained on certain countries weather data, fill in missing data
points for countries with low data quality
After interviewing nearly 50 industry experts, we identified more than 300 specific use cases
across 12 industries. We culled that list to the top ten use cases in each industry based
on the size of the opportunity. We then surveyed a wider group of more than 600 experts
across a variety of industries to determine where they saw the greatest potential to create
value. Results from this survey suggest that the opportunity for business impact is broad.
When we asked experts to rank individual use cases in their industry, each of the 120 use
cases was named as being one of the top three most valuable in its industry by at least one
industry expert.
However, looking at the value creation opportunity is only part of the picture. As discussed
above, machine learning algorithms require a large amounts of data in order to be trained
and effective. For example, improving hiring matches would have tremendous value for
creating a more efficient labor marketand machine learning techniques are well suited
to making more accurate matches. But the quantity and richness of data collected on
candidates is quite limited; the typical individual has far fewer interactions in the labor market
than they do on social media or in the course of online shopping. This potential of machine
learning in the labor market could be constrained by this factor.
Machine learning can be combined with other types of analytics to solve a large swath of business problems
Clustering Regression
Resource allocation
(e.g., k-means) (e.g., logistic)
Classification
Sorting Predictive maintenance
(e.g., support vector machines)
Classification
Prediction
Generation
Our analysis filters business use cases by impact potential and by data richness. We plotted
the top 120 use cases across 12 industries in Exhibit18. The y-axis shows the volume
of available data (encompassing its breadth and frequency), while the x-axis shows the
potential impact. The size of each bubble reflects the diversity of the available data sources.
DUPLICATE from ES
The industry-specific uses that combine data volume and frequency with a larger
opportunity are in the top right quadrant of the chart. These represent areas where
organizations should prioritize the use of machine learning and prepare for a transformation
to take place. Some of the high-opportunity use cases include personalized, targeted
advertising; autonomous vehicles; optimizing pricing, routing, and scheduling based on real-
time data in travel and logistics; predicting personalized health outcomes; and optimizing
merchandising strategy in retail. (See the technical appendix for a detailed scoring of use
cases by industry, as well as a discussion of methodology.)
Machine learning has broad potential across industries and use cases
1.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
Impact score
These use cases in the top right quadrant fall into four main categories. First is the radical
personalization of products and services for customers. Second is predictive analytics,
which involves not only forecasting but also uses such as anticipating fraud and bottlenecks
DUPLICATE from ES
or diagnosing diseases. The third high-impact category for machine learning is strategic
optimization, and the fourth is real-time optimization in operations and logistics. Below we
will examine each of these in turn.
Consider the potential impact of radical personalization in several specific areas. Some
media organizations, for instance, are beginning to use this technology to deliver
personalized content and advertising. Netflixs recommendation engine currently influences
about 80percent of content hours streamed on its platform. One study issued by the
company estimates that using personalization has increased subscriber retention and
engagement to such a degree that it is worth some $1billion annually to the company.97
(See Chapter4 for more on this topic.)
Predictive analytics is another type of use with tremendous potential across almost all
industries (see box at left). In these applications, machine learning helps classify customers
or observations into groups for predicting value, behavior, risk, or other metrics. It can be
used to triage customer service calls; to segment customers based on risk, churn, and
purchasing patterns; to identify fraud and anomalies in banking and cybersecurity; and to
diagnose diseases from scans, biopsies, and other data.
97
Carlos A. Gomez-Uribe and Neil Hunt, The Netflix recommender system: Algorithms, business value, and
innovation, ACM Transactions on Management Information Systems, volume 6, issue 4, January 2016.
98
John Markoff, Scientists see promise in deep-learning programs, The New York Times, November 23, 2012.
Travel, transport, and logistics: Optimize In the oil industry, self-learning simulation models can adjust
routing in real time (airlines, logistics, and parameters and controls based on real-time well data. A
last-mile routing for delivery) mid-sized oil field in Southeast Asia used this application
and generated production improvements of $80million to
Energy: Optimize power plants based on
$100million annually. DHL is using machine learning to optimize
energy pricing, weather, and other real-time
its complex global logistics operation. Tremendous potential for
data
real-time optimization remains untapped for many companies
in functions such as managing distribution networks, managing
inventory, and timing procurement.
99
Tejal A. Patel et al., Correlating mammographic and pathologic findings in clinical decision support using
natural language processing and data mining methods: Natural Language Processing, Cancer, August 2016.
100
These detailed work activities are defined by O*NET, a data collection program sponsored by the US
Department of Labor. See Michael Chui, James Manyika, and Mehdi Miremadi, Four fundamentals of
workplace automation, The McKinsey Quarterly, November 2015.
Exhibit 19
Deep learning is well suited to develop seven out of 18 capabilities required in many work activities
Social
Social and emotional sensing
Social and emotional reasoning
Emotional and social output
Greet customers
NOTE: While this example illustrates the activities performed by a retail worker only, we analyzed some 2,000 activities across all occupations.
Seven of those 18 capabilities are well-suited to being implemented through the use of
machine learning (Exhibit20). The first striking observation is that almost all activities require
capabilities that correlate with what machine learning can do. In fact, only four out of more
than 2,000 detailed work activities (or 0.2percent) do not require any of the seven machine
REPEATS in Apx
learning capabilities. Recognizing known patterns, by itself, is needed in 99percent of all
activities to varying degrees; that is a fundamental capability of machine learning. Natural
language generation, natural language understanding, and sensory perception are required
for most work activities in the economy (79percent, 76percent, and 59percent of all
detailed work activities, respectively). This is not to say that such a high share of jobs is likely
to be automated anytime soon, but it does underscore the wide applicability of machine
learning in many workplaces.
Exhibit 20
Improvements in natural learning understanding and generation as well as social sensing would have the biggest
impact on expanding the number of work activities that deep learning could technically automate
Natural language
76 16 53
understanding
Sensory perception 59 1 5
Generating novel
20 4 25
patterns/categories
Social and
25 41
emotional sensing
Recognizing known
99
patterns/categories
Optimization
33
and planning
Natural language
79 2
generation
Previous MGI research on automation found that 45percent of all work activities, associated
with $14.6trillion of wages globally, has the potential to be automated by adapting
DUPLICATE from ES
currently demonstrated technology.102 Some 80percent of that could be implemented by
using existing machine learning capabilities. But deep learning is still in its early stages.
Improvements in its capabilities, particularly in natural language understanding, suggest the
potential to unleash an even greater degree of automation. In 16percent of work activities
that require the use of language, increasing the performance of machine learning in natural
language understanding is the only barrier to being able to automate that work activity.
Improving these capabilities alone could lead to an additional $3trillion in wage impact
(Exhibit 21). Advances in sensory perception and generating novel patterns and categories,
which could be enabled by deep learning, could further increase the number of activities
that could be automated.
101
Jacob Brogan, An artificial intelligence scripted this short film, but humans are still the real stars, Slate, June
9, 2016.
102
Michael Chui, James Manyika, and Mehdi Miremadi, Four fundamentals of workplace automation, McKinsey
Quarterly, November 2015, and Where machines could replace humansand where they cant (yet),
McKinsey Quarterly, July 2016.
Top 20 groups of work activities by wages that could be affected by improved deep learning capabilities
Other 354
We also examined where the application of deep learning to automation could create the
largest potential wage impact (Exhibit 22). Occupations like court reporters and interpreters
could see nearly their entire jobs automated from improvements in the natural language
understanding capabilities of machine learning. However, since these occupations
employ relatively few people and pay low to medium wages, the expected overall impact
will be small. Customer service representative is the lone occupation that makes the top
10 list for potential total wage impact and lends itself to automation across most of its
work activities. The impact on a suite of frontline supervisory roles would also be large in
dollar terms, as the primary activities for this group are guiding, directing, and motivating
subordinates. Deep learning is likely to have a major impact on occupations with primarily
administrative duties; these include executive assistants, cashiers, and waitstaff. Large
numbers of people are employed in these occupations, which points to the possibility of
significant job displacement. A significant percentage of the activities associated with a set
of higher-paying jobs such as lawyers and nurses could also be automated with advances in
machine learning.
While the potential of machine learning in general and deep learning in particular is exciting
and wide-ranging, there are real concerns related to their development and potential
deployment. Some of these were present even prior to the big data age, such as privacy,
data security, and data ownership. But an additional set of new challenges has arisen.
First, deep learning has a drawback that poses a barrier to adoption in certain applications.
The models that deep learning produces are opaque. As of today, it is relatively difficult to
decipher how deep neural networks reach the insights and conclusions that they do. They
are still a black box. However, researchers are working to create less opaque systems by
doing the forensics to help people understand how these highly complex, trained models
come to the conclusions that they do based on thousands ormillions of connections
and weights between nodes. For example, after AlphaGo played Lee Sedol, the world
champion in the game of Go, the researchers who built that system were able to uncover
what it was thinking when it made certain moves. But this is still a challenge, and when the
mechanism behind a model is not understood, this can be potentially disqualifying in certain
situations. Some decisions (such as hiring and granting loans) need to be transparent for
legal reasons. Trying to run experiments or tweak variables can be difficult in a deep neural
network; Google, for example, has hesitated until recently to use deep learning for its
search algorithm for exactly this reason. There is also the matter of trust. It can be difficult
for decision makers and customers to commit to insights that are generated in a non-
transparent way, especially where those insights are counterintuitive. Medical use cases
could fall into this category. This is not to say that model opacity will forever be a problem
with deep neural networks, but for now, it must be noted that it can be barrier for adoption in
certain use cases.
Improvements in deep learning (DL) could affect billions of dollars in wages in ten occupations globally
Global
Most frequently Global wages that
% of time spent on activities performed group of employ- Hourly DL could
that could be automated if DL DWAs that could be ment wage automate
Occupations improves (by DWA group)1 automated if DL improves Million $ $ billion
Secretaries and
Interacting with computers
administrative 28 to enter data, process 48.2 3.90 109.8
assistants, except legal,
information, etc.
medical, and executive
Monitoring processes,
Managers, all other 27 8.3 18.25 86.7
materials, or surroundings
First-line supervisors of
Interpreting the meaning of
office and administrative 35 12.8 8.75 81.5
information for others
support workers
Performing administrative
Cashiers 18 68.1 3.18 81.5
activities
First-line supervisors of
Organizing, planning, and
helpers, laborers, and 24 8.5 12.73 54.2
prioritizing work
material movers
SOURCE: National labor and statistical sources; McKinsey Global Institute analysis
Third, the potential risks of labor disruption from the use of deep learning to automate
activities are becoming a critical debate, particularly in light of existing anxiety about
the quantity and quality of available jobs. There is historical precedent for major shifts
among sectors and changes in the nature of jobs. In the United States, the share of farm
employment fell from 40percent in 1900 to 2percent in 2000; similarly, the share of
manufacturing employment fell from roughly 25percent in 1950 to less than 10percent
in 2010. In both circumstances, while some jobs disappeared, new ones were created,
although what those new jobs would be could not be ascertained at the time. In 1950, few
would have predicted thatmillions of people would be employed in information technology
jobs in the following decades. But the past does not provide adequate assurances that
sufficient numbers of new, quality jobs will be created at a sufficient rate. At the same time,
many countries have or will soon have labor forces that are declining in size, requiring an
acceleration of productivity to maintain historical rates of economic growth. A forthcoming
MGI research report on automation will address the potential pace of automation in different
economies. But certainly dealing with job displacement, retraining, and unemployment
will require a complex interplay of government, private sector, and educational and training
institutions, and it will be a significant debate and an ongoing challenge across society.
The world could be on the cusp of untold disruption from machine learning today and deep
learning in the near future. Both the opportunities and the risks are great. Organizations that
are able to harness these capabilities effectively will be able to create significant value and
differentiate themselves, while others will find themselves increasingly at a disadvantage.
Location-based services
For revenue to service providers, we used the same methodology as described in our 2011
report to calculate revenue. We updated with numbers from 2015, which allowed us to
calculate the improvement in revenue relative to the potential forecasted. We did this update
only for the top three levers: GPS navigation devices and services, mobile phone location-
based service applications, and geo-targeted mobile advertising services (ExhibitA1).
These accounted for the vast majority of the value potential from the 2011 report. For
value to end consumers, we again focused on the top three levers: time and fuel saving
in traveling through access to GPS navigation, value gained from mobile phone location-
based services, and return on investment in geo-targeted mobile advertising services. The
forecasts we developed in 2011 were closely tied to mobile phone penetration rates. Given
the current mobile phone penetration rate, we could interpolate the progress made against
our expectation to arrive at the impact in these levers since 2011.
Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, June 2011.
103
Exhibit A1
Location-based services have experienced strong growth, capturing a large share of the value potential we
identified in 2011
$ billion % achieved
vs. estimate
Geo-targeted mobile
3040 5-10 ~25
advertising services2
1 Includes, for example, people tracking, location sharing, social community applications, city/regional guides/services, entertainment.
2 Derived as share of total mobile advertising market size.
3 PNDs and in-car devices/services.
4 Based on app stores; includes revenues generated from application stores (across operating systems).
NOTE: Not to scale. Numbers may not sum due to rounding.
SOURCE: 451 Research 2016; GNSS Market Report 2015; Marketsandmarkets 2016; Ovum 2015; Statista 2016; McKinsey Global Institute analysis
Exhibit A2
US retail has made solid progress in integrating data and analytics, especially among larger players,
but further potential can be realized
Operating 843
margin,
636
2011 933
% change
413
-4 to -28
Adoption
rate, 2016 5080 6090 80100 7090 3080
% 3040%
value
Average captured
value
realized by 3060 2550 2050 5070 2050
adopters
%
Operating 319
margin 216
212
value
realized 15
% change
-1 to -10
US health care has seen increased adoption but has captured only ~10 percent of the potential value identified in
MGIs 2011 research
Value 164
potential,
2011
USD billions
105
47
5 9
Adoption
rate, 2016 3070 1020 3050 90100 6080
% 1020%
value
Average captured
value
realized by 1030 2030 3050 40100 6080
adopters
%
Value 535
potential 1027
realized
USD billions 13 25 36
The EU public sector has captured less than 15 percent of the potential value estimated in MGIs 2011 research
Current state of EU
2011 report estimates public-sector agencies
Adoption Impact
Impact rate today achieved by
potential compared todays
Assumed addressable on baseline to 2011 adopters
Category baseline % % change %
NOTE: Similar observations hold true for governments around the world (e.g., United States closer to 20%; Israel closer to 10%).
Manufacturing
The manufacturing sector was assessed in a similar way through a combination of expert
interviews and outside sources per lever. Since the levers from the 2011 report did not total
to a dollar figure or margin in aggregate (each lever had its own measure for impact), we took
the average value capture across the levers as the industry view (ExhibitA5).
Manufacturing has lagged in adoption, with most value being captured by a small group of advanced players
%
Current state across manufacturing
industries
Supply chain Demand +23 profit margin +510 -1015 product cost
management forecasting/ -3-7 working capital (one -5 working capital
shaping and supply time) (one time)
planning
Production Digital factories -1025 operating costs
for lean Up to +7 revenue
manufacturing -1015 operating
+510
Sensor data-driven -1050 assembly costs cost
operations +2 revenue
analytics
To estimate the number of data scientists in the United States in 2014, we proceed in the
following fashion. First, we define the skills of data science (as a subset of skills defined in
the Burning Glass database) to be: data science, machine learning, data mining, statistical
modeling, predictive analytics, predictive models, natural language processing, logistic
regression, support vector machines (SVM), neural networks, naive Bayes, k-means,
principal component analysis, R or WEKA. Also, we did not include skills such as Python
because Python is also listed as a skill for many other roles (e.g., software developer),
and Hadoop, which we consider to be more characteristic of the roles of data architects
and engineers.
Then, for each Standard Occupational Classification (SOC) occupation code in the United
States, we use Burning Glass data to identify the share of job postings in each occupation
that include data science skills (based on our definition of data science skills above). We
then multiply this share by the Bureau of Labor Statistics (BLS) employment for each SOC
To calculate the new supply of data scientists to 2024, we assume that data science jobs will
require a bachelors degree at a minimum. We estimate the share of graduates in each level
(bachelors or masters degree or PhD) and field of study who would have the capabilities
for the work. We linearly project the new data scientist supply between 2014 and 2024
based on historical Integrated Postsecondary Education Data System (IPEDS) graduation
figures. We net out people who might have more than one degree, the population of foreign
students who likely will not stay in the United States, and people who leave the workforce
due to retirement (albeit the average age of a data scientist is young enough that the number
of data scientists retiring within a decade will likely be small).
To calculate demand for data scientists to 2024, we estimate the share of employees
who are data scientists at companies that are at the talent frontier, which we define
as the largest companies in each sector that have the greatest share of data talent as a
percentage of their employee base. So we first identify the ten largest companies (by market
capitalization) in each sector in the United States. For these companies, we count the
number of employees with data scientist and data analyst titles in LinkedIns database
and divide that number by the companys total employment count in LinkedIn. For each
sector, we take a weighted average of the three largest companies with the highest talent
share, as a proxy for the data talent frontier of the sector. Then we combine these shares
with BLS employment projections by occupation for 2024 under two scenarios. In the low
scenario, we assume that in each sector, the data talent share for the average company
in 2024 will look like its sector frontier today. In the high scenario, we assume that the data
talent share for the average information and communications technology (ICT), professional
services, and health care and social assistance organization in 2024 will look like the frontier
in ICT today. And we assume that the average company in any other sector in 2024 will look
like the frontier in wholesale today.
To calculate new demand for business translators to 2024, we assumed a ratio of business
translators to data scientists of 4:1 to 8:1. Since we project approximately 500,000 new
data science jobs to 2024, we estimate twomillion to fourmillion new business translator
jobs to 2024. Linearly projecting Integrated Postsecondary Education Data System degree
completion data, we expect between ninemillion and tenmillion STEM plus business
graduates between now and 2024.
The internet of things: Mapping the value beyond the hype, McKinsey Global Institute, June 2015, and
104
Automotive revolutionperspective towards 2030, McKinsey Advanced Industries Practice, January 2016.
McKinsey Global Institute The age of analytics: Competing in a data-driven world 101
We focused on three primary categories to structure our approach: cost savings for
consumers (vehicle purchasers/drivers), time savings for consumers, and benefits for the
broader public.
Across these three categories, we identified seven individual levers. To size the economic
impact for each lever, we estimated the value pool at stake for developed and developing
regions separately and assumed a 10 to 30percent capture rate by 202530. The value
pool estimates were constructed as follows:
Fewer vehicle purchases (cost savings): We estimated the total savings from
consumers switching to mobility services where economically advantageous.
Specifically, we estimated the cost savings associated with switching at different annual
mileage totals and aggregated across the proportion of consumers falling in each
bucket. For developing regions, we adjusted for the lower cost of ownership and lower
cost of transportation but otherwise adopted the same methodology.
Reduced parking cost (cost savings): We estimated the total savings for consumers
from lower demand for parking spots. Focusing on commercial parking, we combined
the share of vehicles, the average price for parking, and the total number of working
hours to determine the total value pool.
Time savings from parking (time savings): We estimated the total time spent searching
for parking by urban vehicles, based on a lower estimate of 15percent of total time
in vehicles and the share of cars in cities. Applying the time value for developed and
developing regions yields the total value at stake.
Reduced fuel cost from parking (cost savings): Based on the time savings implied by
the previous lever, we estimated the associated fuel value by applying the estimated fuel
efficiency and fuel cost for developed and developing regions.
Reduced pollution from parking (public benefit): Based on the time and fuel savings
implied by the previous two levers, we estimate the carbon dioxide emissions from
parking searches. Applying the value per ton of CO2 produces the savings estimates.
Productive time during rides (time savings): We estimated the total time spent driving
by individuals in both developed and developing regions and assumed 50percent of this
time could be spent productively. Applying the time value for these segments produced
the total value pool.
Improve overall wellness through tailored programs: This lever was sized using
the same assumptions made in our IoT report. No changes were made except for
the assumed impact $/DALY, using the WHO recommendation of three times GDP
per capita.
Deliver the right treatment and optimize the dose: We based this sizing on the
methodology and assumptions used in an analysis of genomics in a 2013 MGI report.105
Genomics is a reasonable proxy for precision medicine and treatment selection. In the
2013 report, we sized the impact in cancer, diabetes, and cardiovascular disease. In this
report, we used similar estimates across all non-communicable diseases and estimated
the total DALY impact. Again, our assumed impact $/DALY we used for this report was
the WHO recommendation of three times GDP per capita.
Eliminate treatments that are ineffective for that individual: We followed the same
methodology as in lever 5 to determine the applicable population for these treatments.
Using US numbers for cost per disease area, we determined the total costs at stake
in these treatment areas for precision medicine. We then assumed that some costs
could be avoided due to precision screening, based on our earlier estimates of the cost
savings possible from companion diagnostics in our 2011 report (lever for personalized
medicine). This gave us our total impact in cost savings.
Disruptive technologies: Advances that will transform life, business, and the global economy, McKinsey Global
105
McKinsey Global Institute The age of analytics: Competing in a data-driven world 103
Massive data integration
In analyzing the value creation potential of improved data integration in retail banking, we
identified 20 levers that have an impact on the revenue and cost of a sample retail bank.
We group the levers into three categories depending on how much of the revenue and
cost impact comes from data and analytics and how much from digital. On one end, in
pure data and analytics levers, we assign the levers where advanced analytics and a
considerable amount of data are needed, e.g., next product to buy recommendations and
dynamic pricing. On the other end, in pure digital levers, we include levers such as reducing
the physical footprint of branches and ATMs as a result of migrating customer to digital
channels. The third category in between consists of levers where both data and analytics
and digital are needed for value creation, such as efficiency gains from automating business
support functions where both digitalization of the processes and automated decision
making by analytics are needed.
To convert the single bank view into global value creation potential, we first leave out the
revenue levers that are shifting value among players and then convert the remaining value-
creating revenue levers into productivity gains that impact cost. For cost baseline numbers
of the industry, we use the McKinsey Panorama global banking database to obtain the
baseline separately for developed and emerging markets (2015 estimates for interest
revenue, fee revenue and risk cost; 2014 cost-to-income ratio estimates for converting
the revenue into operating cost). These baseline numbers are then multiplied with the
efficiency potential associated with a single bank to create a view of the overall potential for
the industry.
To collect these use cases, we first characterized machine learning capabilities or problem
types. We used this list of problem types to co-create a list of more than 300 use cases with
about 50 industry experts. The use cases were also classified across nine use case types
as recurring themes emerged in our analysis. This grouping is used in a number of exhibits in
this appendix.
The 300 use cases formed the basis for our analysis of the potential of machine learning.
Across all of these use cases, we constructed two separate indexes to characterize the
scope and scale.
First, we estimated the value potential by conducting a survey across internal McKinsey
experts aligned to the industries included in our report. In this survey, respondents were
asked to rank the top three use cases in their industry out of a prioritized list of ten. Then, for
each of the 120 use cases we constructed a value index by weighting the number of times
each use case was ranked first by three respondents, second by two, and third by one.
Exhibit A6
Machine learning has great impact potential across industries and use case types Impact potential
Low High
Manufacturing
Transport and
Automotive
Health care
Agriculture
Consumer
ceuticals
Telecom
logistics
Pharma-
Finance
Energy
Public/
social
Media
Problem type
Real-time
optimization
Strategic
optimization
Predictive
analytics
Predictive
maintenance
Radical
personalization
Discover new
trends/anomalies
Forecasting
Process
unstructured data
Second, we assessed the data richness available for each of the 300 total use cases. The
two criteria we focused on were as follows:
Volume of data: Higher volumes of data mean more data points for the algorithms to
learn from and thus, fewer barriers to capture the use case. So, the more data there
are, the more beneficial it is for deep learning. This was taken as both a breadth and a
frequency measure: how many individual data points there were as well as how often
an interaction happened with each data point over the course of a year. After that, we
multiplied the volume and frequency to arrive at an aggregate volume measure.
Variety of data: In addition to volume, the variety of data can play a crucial role. The
greater the variety, the easier it is for algorithms to learn. The different types of data that
we scored were transactions (virtual transactions, including ones in the conventional
sense as well as other key metrics such as web page visits), social, audio/video/images,
IoT/sensor data, scientific/engineering data (from experiments/simulations), mobile/
location (actual location-based data from mobile devices) and archives (historical
records, mostly physical). Each use case was assigned a value based on the number of
data varieties available in that context.
McKinsey Global Institute The age of analytics: Competing in a data-driven world 105
We aggregated both of these measures to form a proxy for data richness (Exhibit A7).
Exhibit A7
Rich data is an enabler in some use cases but the lack of it can be a barrier in others Data richness
Low High
Manufacturing
Transport and
Automotive
Health care
Agriculture
Consumer
ceuticals
Telecom
logistics
Pharma-
Finance
Energy
Public/
social
Media
Problem type
Real-time
optimization
Strategic
optimization
Predictive
analytics
Predictive
maintenance
Radical
personalization
Discover new
trends/anomalies
Forecasting
Process
unstructured data
In the exhibits that follow (A8 to A19), we have consolidated results from the impact and
data richness scoring. For each industry group, we list the top ten use cases along with
the impact score and data richness score. Impact scoring ranges from 0 to 3, while data
richness ranges from 0 to 2 based on an average across breadth, frequency, and variety.
McKinsey Global Institute The age of analytics: Competing in a data-driven world 107
Exhibit A9
Identify root causes for low product yield Discover new trends/
(e.g., tool-/die-specific issues) in manufacturing anomalies
0.5 0.7
Write product descriptions and ads for diverse Price and product
product portfolio optimization
0 0.3
McKinsey Global Institute The age of analytics: Competing in a data-driven world 109
Exhibit A11
McKinsey Global Institute The age of analytics: Competing in a data-driven world 111
Exhibit A13
McKinsey Global Institute The age of analytics: Competing in a data-driven world 113
Exhibit A15
McKinsey Global Institute The age of analytics: Competing in a data-driven world 115
Exhibit A17
McKinsey Global Institute The age of analytics: Competing in a data-driven world 117
Exhibit A19
For this report, we classified seven of these 18 capabilities as relevant to deep learning,
in that these are capabilities deep learning is well suited to implement (ExhibitA20). For
example, deep learning networks have dramatically improved the ability of machines to
recognize images, which is a form of sensory perception. Thus, we include this in our list
of seven deep learning capabilities: natural language understanding, sensory perception,
generating novel patterns/categories, social and emotional sensing, recognizing known
patterns/categories, optimization and planning, and natural language generation.
Exhibit A20
Deep learning is well suited to develop seven out of 18 capabilities required in many work activities
Social
Social and emotional sensing
Social and emotional reasoning
Emotional and social output
Greet customers
NOTE: While this example illustrates the activities performed by a retail worker only, we analyzed some 2,000 activities across all occupations.
For some exhibits, we use DWA groups to summarize the impact across the approximately
2,000 DWAs. These groups are provided by the BLS and classify the DWAs into
37elements or categories. Since each DWA falls into one of these categories, any analysis
can be aggregated up to the DWA group level by summing across impact or wages for the
relevant DWAs.
A
Andel, C., S. L. Davidow, M. Hollander, and D. A. Moreno, The economics of health care
quality and medical errors, Journal of Health Care Finance, volume 39, number 1, fall 2012.
B
Booth, Adrian, Niko Mohr, and Peter Peters, The digital utility: New opportunities and
challenges, McKinsey.com, May 2016.
Bughin, Jacques, Big data, big bang? Journal of Big Data, volume 3, number 2,
January 2016.
Bughin, Jacques, Big data: Getting a better read on performance, The McKinsey
Quarterly, February 2016.
Bughin, Jacques, Ten lessons learned from big data analytics, Journal of Applied
Marketing Analytics, forthcoming.
C
Charles, Dustin, Meghan Gabriel, and Talisha Searcy, Adoption of electronic health record
systems among US non-federal acute care hospitals: 20082014, Office of the National
Coordinator for Health Information Technology, data brief number 23, April 2015.
Chui, Michael, and James Manyika, Competition at the digital edge: Hyperscale
businesses, The McKinsey Quarterly, March 2015.
Chui, Michael, James Manyika, and Mehdi Miremadi, Four fundamentals of workplace
automation, The McKinsey Quarterly, November 2015.
D
Dills, Angela K., and Sean E. Mulholland, Ride-sharing, fatal crashes, and crime, Providence
College working paper, May 2016.
Dilmegani, Cem, Bengi Korkmaz, and Martin Lundqvist, Public-sector digitization: The
trillion-dollar challenge, McKinsey.com, December 2014.
E
Ericsson, Ericsson mobility report: On the pulse of the networked society, June 2016.
G
Gantz, John, and David Reinsel, The digital universe in 2020, IDC, February 2013.
Gore, Adrian, How Discovery keeps innovating, The McKinsey Quarterly, May 2015.
Greenwood, Brad N., and Sunil Wattal, Show me the way to go home: An empirical
investigation of ride sharing and alcohol related motor vehicle homicide, Temple University,
Fox School of Business research paper number 15-054, January 2015.
H
Hilbert, Martin, and Priscila Lpez, The worlds technology capacity to store, communicate,
and compute information, Science, volume 332, issue 6025, April 2011.
I
Indeed.com, Beyond the talent shortage: How tech candidates search for jobs,
September 2015.
K
Kohn, Linda T., Janet M. Corrigan, and Molla S. Donaldson, eds., To err is human: Building a
safer health system, Committee on Quality of Health Care in America, Institute of Medicine,
National Academy Press, 2000.
Kuyer, Lior, Shimon Whiteson, Bram Bakker, and Nikos Vlassis, Multiagent reinforcement
learning for urban traffic control using coordination graphs, Machine learning and
knowledge discovery in databases, volume 5211 of Lecture Notes in Computer Science
series, Springer Berlin Heidelberg, 2008.
L
Lund, Susan, James Manyika, and Kelsey Robinson, Managing talent in a digital age, The
McKinsey Quarterly, March 2016.
M
Makary, Martin A., and Michael Daniel, Medical error: The third leading cause of death in
the US, BMJ, May 2016.
McKinsey Advanced Industries Practice, Car data: Paving the way to value-creating mobility,
March 2016.
McKinsey & Company survey, The need to lead in data and analytics, McKinsey.com, April
2016, available at http://www.mckinsey.com/business-functions/business-technology/our-
insights/the-need-to-lead-in-data-and-analytics.
McKinsey Global Institute, Big data: The next frontier for innovation, competition, and
productivity, June 2011.
McKinsey Global Institute, Digital America: A tale of the haves and have-mores,
December 2015.
McKinsey Global Institute, Digital Europe: Pushing the frontier, capturing the benefits,
June 2016.
McKinsey Global Institute, Disruptive technologies: Advances that will transform life,
business, and the global economy, May 2013.
McKinsey Global Institute, The internet of things: Mapping the value beyond the hype,
June 2015.
McKinsey Global Institute, Open data: Unlocking innovation and performance with liquid
information, October 2013.
McKinsey Global Institute, Playing to win: The new global competition for corporate profits,
September 2015.
P
Pande, Vijay, FAHs achievements in 2015, with a glimpse into 2016, Folding@home blog,
December 6, 2015.
Patel, Tejal A., Mamta Puppala, Richard O. Ogunti, Joe Edward Ensor, Tiangchen He, and
Jitesh Shewale, Correlating mammographic and pathologic findings in clinical decision
support using natural language processing and data mining methods: Natural Language
Processing, Cancer, August 2016.
R
Ransbotham, Sam, David Kiron, and Pamela Kirk Prentice, The talent dividend:
Analytics talent is driving competitive advantage at data-oriented companies, MIT Sloan
Management Review, April 25, 2015.
Robb, Drew, Are we running out of data storage space? World Economic Forum blog,
November 10, 2015.
S
SAP, Real-time enterprise stories, case studies from Bloomberg Businessweek Research
Services and Forbes Insights, October 2014.
Stawicki, Elizabeth, Talking scales and telemedicine: ACO tools to keep patients out of the
hospital, Kaiser Health News, August 2013.
U
US Department of Energy, 2014 smart grid system report, August 2014.
McKinsey Global Institute The age of analytics: Competing in a data-driven world 123
RELATED MGI
AND MCKINSEY RESEARCH
Big Data: The next frontier for innovation, Disruptive technologies: Advances that
competition, and productivity (May 2011) will transform life, business, and the global
Big data will become a key basis of economy (May 2013)
competition, underpinning new waves of Twelve emerging technologiesincluding
productivity growth, innovation, and consumer the mobile Internet, autonomous vehicles,
surplusas long as the right policies and and advanced genomicshave the potential
enablers are in place. to truly reshape the world in which we live
and work. Leaders in both government and
business must not only know whats on the
horizon but also start preparing for its impact.
www.mckinsey.com/mgi
E-book versions of selected MGI reports are available at MGIs website,
Amazons Kindle bookstore, and Apples iBooks Store.
Download and listen to MGI podcasts on iTunes or at
www.mckinsey.com/mgi/publications/multimedia/
Cover image: Nadla/Getty Images.
Cover insets (left to right): Businesswomen in conference room Hero Images/
Getty Images, young man in city svetikd/Getty Images, builders with tablet
Monty Rakusen/Getty Images.
Contents page images (top to bottom): Woman and man at computer
Hero Images/Getty Images, doctor with screen Wichy/Shutterstock,
man with face recognition scanner Monty Rakusen/Getty Images.
McKinsey Global Institute
December 2016
Copyright McKinsey & Company
www.mckinsey.com/mgi
@McKinsey_MGI
McKinseyGlobalInstitute