TheBigDataAnalytics PDF
TheBigDataAnalytics PDF
TheBigDataAnalytics PDF
Big
Analytics
Leaders Collaborative Book Project
For, Of, and By the Data
Analytics Leaders and
Influencers
Compiled by Deeksha Joshi
& Vishal Kumar
Contents
INTRODUCTION ............................................................................................................................................ 5
3. From Data to Insight: Seven Tips for a Great Data Strategy by Anne Russell ........................ 16
6. Act Like an Analyst, Think Like a Strategist by Dr. Justin Barclay .......................................... 28
8. Three BIG Reasons Companies are Failing with Big Data Analytics by Ian St. Maurice ..... 36
9. How to Leverage Power of Big Data Analytics in Making Strategic Decisions by Ankush
Chopra........................................................................................................................................................... 40
10. How to Hire the Best Analytics Leaders: And its not the way you think! by Tony Branda
43
11. Big Decisions Will Supplant Big Data by David Kenny ......................................................... 48
13. What You Need to Know About Big Data by Dorie Clark ...................................................... 60
14. Change Your Business One Metric at a Time by Bill Franks ................................................. 63
15. The Importance of Analytics and Data Science by James Barrese ........................................ 66
4. Combating the Coming Data Deluge: Creating real-time awareness in the age of the
Internet of Things by Ryan Kirk .............................................................................................................. 86
6. The Five Letters that Will Change the Data World: BYOBI by Tomasz Tunguz ..................... 97
1
8. Beyond "Big Data": Introducing the EOI framework for analytics teams to drive business
impact by Michael Li ................................................................................................................................ 105
9. Can You Seize the Opportunity from Analytics If You are Cheap? by Lora Cecere ............ 109
11. The Big Hole in Big Data by Colin Shaw ................................................................................. 117
13. Big Data: Two Words That are Going to Change Everything by Ali Rabaie .................... 121
14. What Social Media Analytics and Data Can't Tell You by Beth Kanter............................. 125
15. Big Data, Great! Now What Do We Do With It? by Ken Kring ............................................ 130
16. Decoding Buzzwords: Big Data, Predictive Analytics, Business Intelligence by Cindy
Gordon ........................................................................................................................................................ 137
17. The Case Against Quick Wins in Predictive Analytics Projects by Greta Roberts .......... 141
18. How Many of Us Think Big Data is Big BS? by Ali Syed ..................................................... 144
19. So You Can Predict the Future. Big deal. Now Change It by Alex Cosmas ....................... 148
20. Best Practices in Lead Management and Use of Analytics by Marianne Seiler ................ 151
21. Integrated Information Architecture for Real-time Analytics by Madan Gopal .............. 154
22. The Practice of Data Science: Investigating Data Scientists, Their Skills and Team
Makeup by Bob E. Hayes......................................................................................................................... 158
24. The Esoteric Side of Data Analytics by Kiran Garimella ...................................................... 170
1. Full Stack Analytics The Next Wave of Opportunity in Big Data by Chip Hazard ........... 178
3. Computational Knowledge and the Future of Pure Mathematics by Stephen Wolfram ..... 190
4. Consumers are on the Verge of Understanding Big Data: Are You? by Douglas Rushkoff 216
5. The Frightening Perils and Amazing Benefits of Big Data by Vivek Wadhwa .................... 220
6. Is Little Data the Next Big Data? by Jonah Berger ...................................................................... 226
8. Big Data Analytics, Where Are We Going? by John Ryan ........................................................ 234
2
9. A Path Forward for Big Data by Jules Polonetsky ...................................................................... 239
10. Democratization of Data Analytics: Why, Who, and What by Kirk Borne ........................ 243
1. Artificial Intelligence Will Make Smart Lawyers Smarter (and dumb ones redundant) by
Maximiliano Marzetti .............................................................................................................................. 248
2. Cyber security Solutions Based on Big Data by Steven King ................................................... 252
3. Personalization of Big Data Analytics: Personal Genome Sequencing by Peter B. Nicol .. 259
4. Internet of Things : Realizing Business Value through Platforms, Data Analytics &
Visualization by Pranay Prakash .......................................................................................................... 268
5. The Private Eye of Open Data on a Random Walk: 8 different routes by Debleena Roy .... 275
6. Data Sheds New Light on the On-demand Economy by Alex Chriss ..................................... 281
9. The Power of Apps to Leverage Big Data and Analytics by Joseph Bradley ......................... 299
10. State of Telecom Business Around Managing Big Data and Analytics by Dr. Hossein
Eslambolchi ................................................................................................................................................ 303
11. Solution to Key Traffic Problem Required a Fresh Look at Existing Data by Tony Belkin
307
12. Machines Won't Replace Insurance Agents in 2016, But They Will Do This by Steve
Anderson ..................................................................................................................................................... 311
13. How a Digital Transformation Can Improve Customer Experience with Big Data by
Ronald van Loon ....................................................................................................................................... 318
17. Big Data Revolution in the Banking Industry by Marc Torrens .......................................... 337
18. Optimizing the Netflix Streaming Experience with Data Science by Nirmal Govind .... 343
20. Five Things You Should Measure with Digital Marketing Analytics by Judah Phillips360
3
4
INTRODUCTION
You are busy preparing a report for your manager. A few cubicles away, you
hear a colleague murmuring something to her mate about an analytics
project. You hear somebody say the word, data. Your interest is raised.
You hear more references to the term, data, and all things related like Big
Data, data science, machine learning and analytics. This is the state of
todays highly quantified world. Everybody talks about data and what they
can do with it.
A few months ago, I was pulled into a conversation at a cocktail party. I was
asked by a business owner about how big data could improve his business.
Although the conversation remained rather light (he mentioned he used Excel
to analyze this data), I told him my standard answer. You can get a lot of
insight into whats happening in your business. You can understand the
current state of affairs. You can make predictions about what will happen in
the future. You can also understand what you need to do to improve your
business. These insights can help you make better decisions to drive your
business forward.
The business owner needed much more information about big data analytics
than I could offer in a brief conversation. But our conversation raised my
interest. I wondered what other experts in the field of analytics would say
about the topic of Big Analytics. This was the impetus for this book.
The Big Analytics Book Project is an aggregation of articles that best represent
thoughts by some of the leading minds in the data analytics industry. My
colleagues at AnalyticsWeek and I reached out to some of the top minds in
the analytics industry to see what they had to say about the topic. We
organized their contributions into a book to share with that curious business
owner as well as others who might have the same questions.
I would like to thank the contributing authors who have been very generous
in providing their time, thoughts and articles for the book. While we gave the
5
authors full autonomy to contribute whatever they wanted, we organized
their contributions into sections to, hopefully, make the readers journey
easier. So, we hope you enjoy the mesmerizing ride of these carefully curated
write-ups from some of the great minds in data analytics.
D. Case Studies
6
This section puts theory to practice, including articles that provide examples
of how you can achieve success using data analytics.
7
SECTION A: STRATEGY FOR THE DATA ANALYTICS LEADERS
Big Analytics will have the biggest impact if business leaders first embrace the ideas in
this book. Business leaders are often occupied with current business problems which
are not always related to data analytics. They sometimes lack the vision of how data
analytics can help them do their jobs. This section is targeted to help make leaders more
data driven. The authors have shared their golden nuggets on how tomorrows leaders
should create and embrace a data driven culture.
The best way to approach this section is by reading each article as its own piece. We
recommend you read the authors biographical information to help you first
understand their perspective. Understanding the authors background might provide
you the right context to help facilitate your understanding of their material.
8
1. Rise of Data Capital by Paul Sonderegger
But, were getting ahead of ourselves. First, we need to acknowledge a few basics.
Because every activity in commercial, public, and private lives uses and produces
information, no organization is insulated from the effects of digitization and
datafication. Every company is thus subject to three laws of data capital:
1. Data comes from activity. Data is a record of what happened. But if youre not
party to the activity when it happens, your opportunity to capture that data is
lost. Forever. So, digitize and datafy key activities your firm already conducts
with customers, suppliers, and partnersbefore rivals edge you out. At the same
time, look up and down your industrys value chain for activities youre not part
9
of yet. Invent ways to insert yourself in a digital capacity, thereby increasing
your share of data that the industry generates.
2. Data tends to make more data. Algorithms that drive pricing, ad targeting,
inventory management, and fraud detection all produce data about their own
performance that improve their own performance. This data capital flywheel
effect creates a competitive advantage thats very hard to catch, which means
that seizing those digital touchpoints is doubly important.
This embarrassment of potential riches creates new problems. Even when focusing on
the highest-priority opportunities to cut costs or increase revenue, there are a dizzying
number of possible data sets that could be combined to uncover new insights, and an
equally large number of possible ways to put the discovered correlations and
connections to use. Two new capabilities cut this opportunity space down to size.
First, data marketplaces. Data scientists need a way to shop for data sets like
theyre at Lowes picking up supplies. Companies like BlueKai , which Oracle
acquired last year, let both data experts and civilians like marketing managers
browse hundreds of consumer data sets with more than 30,000 attributes. They
10
can combine any number of sets in innovative ways to quickly gain new
perspectives on target customers. New software like Oracle Big Data Discovery
brings this kind of easy exploration and discovery inside the firewall, creating
internal data marketplaces for previously siloed enterprise data.
Second, data liquidity. The marketplaces answer only spot demand from
business analysts, but data gets used in many other ways. And as firms store
increasingly diverse data in different silosHadoop, NoSQL stores, as well as
relational databasesthe transaction costs of accessing, moving, and
manipulating desired data will go up. Firms need ways to bring those costs back
down. For example, Big Data SQL lets applications query different repositories
full of increasingly diverse data with a single query, as if it were all in a single
system. And new data integration products exploiting in-memory and low-cost,
scale-out technologies will nearly instantly convert data into new shapes for fast-
changing algorithmic pricing, fraud detection, and inventory management.
Get conversant. Top-performing CEOs are familiar with the basic concepts that
drive key differentiators of their businesses, like branding, product design,
logistics, and engineeringwithout being the firms top expert in any of them.
The same should go for information technology. CEOs need crash courses in the
basic philosophies and trade-offs that drive different approaches to data
management, integration, analytics, and apps so they can make informed
decisions about digital business strategies.
Elevate the chief data officer. The CDO should be on par with the CFO, making
sure data capital is as productive as it can be while staying within the bounds of
proper use. Your companys data capital is essentially a deposit your customers,
partners, and suppliers place with you. Just like with financial capital, data
11
capital should be under the watchful eye of someone responsible for securing it,
shielding it from misuse, and auditing what actually happens to it on a daily
basis to ensure policy and regulatory compliance.
Highlight your data capital for Wall Street. Technology firms, especially cloud
companies like Oracle, have already started to include data capital numbers
(such as number of web sessions in their cloud or number of candidate records in
their HCM cloud) in their quarterly meetings with analysts. As platform
competition heats up in traditional industries, expect retailers to boast about
average number of data points collected per customer, usage-based auto insurers
to cite aggregate data collected annually, and logistics firms to emphasize the
total number of package scans captured.
The rise of data capital has already begun. The shift in thinking required to create new
strategic positions from shrewd digitization and datafication of value chain activities is
a habit that takes a while to build. All CEOs should add this to their to-do lists.
12
2. Investing in Big Data by Bill Pieroni
However, firms initially had some difficulties incorporating data and analytics into
their operations. They gathered a limited number of variables and stored them in
multiple data stores with different formats and structures. Additionally, filtering the
data to validate what is relevant and impactful, or the signal, from the noise became
difficult as the amount of data increased exponentially. Based on a study conducted by
IDC, an IT consultancy, the amount of data available globally grew 27-fold to
approximately 2.8 trillion gigabytes from 2005 to 2012. The study also noted that
roughly 25% of this data is useful, but only 3% of it has been tagged for leverage and
only 0.5% of it is currently analyzed.
13
Most leading organizations see a need to enhance internal capabilities to collect, store,
access, and analyze these exponentially large, complex datasets, increasingly known as
Big Data. However, leaders need to allocate greater investments to Big Data capabilities
in order to fully realize the value potential. These investments need to be made across
the five segments of the data and analytics value chain.
Collection & Readiness: Large, complex datasets need to be collected and managed
effectively. Organizations generate data within independent silos. In order to maximize
data leverage, organizations should maintain data standards to ensure data accuracy,
consistency, and transferability.
Processing: Data must be processed in real time. Gaining a few days on competitors can
be the key to survival. Therefore, organizations should evaluate their architecture,
algorithms, and even programming language to substantially increase processing
speed.
14
availability, legacy infrastructure, and operating models. However, organizations that
are able to effectively leverage data and insights to drive differentiated value
propositions and outcomes will dominate their industries. Ultimately, these
organizations will be industry leaders rather than just industry participants.
15
3. From Data to Insight: Seven Tips for a Great Data Strategy by Anne Russell
Any good data strategy is going to involve understanding key factors that will impact
implementation. Some of it will be obvious and based on what you should expect in
any data should at minimum be some sense of requirements, how the pieces will work
together, how data flows will be managed and visualized, and how different users will
interact with data over time. But if the process is really good, it will also explore the
burgeoning potential of new data driven approaches and help you determine which are
right for your approach. To get to great, consider the following:
16
realm of Hmm I didnt think about that. It will get you thinking about how to use
data now and how you could use it years from now. Its only once you understand the
potential goals and possibilities that you can narrow down your options into something
realistic that will meet expectations and build for the future.
17
what you need. Some, not enough. Indeed, when it comes down to your architecture
and design, your strategy will need to explain how these components will work
together, identify potential issues that can emerge, and provide workarounds that will
help your approach succeed. Because a good strategy will be designed based on how
the tools should work, but a great strategy will reflect the experience of what actually
does.
18
It used to be that we could design interactive experiences based on assumptions of how
users have interacted with data in the past. But data driven approaches are just starting
to enable the exploration of what data can do and how users can interact with it. There
are very few, if any, use cases of direct relevance to most organizations or industries
from which to look for inspiration. As a result, developing interactive experiences for
use within a great data driven strategy means more than just research into user
preferences and knowing whats already been done within your industry. It also
requires knowing what has been done in other industries, and in other related and
unrelated sectors. Inspiration can come from anywhere, and a great strategist will
recognize when it is relevant to your approach.
This is all to say, its worth it to invest resources in developing a data strategy before
you start implementing and testing any approach, whether that strategy is good or
great. Investing a smaller sum at the onset of any data driven approach will help to
assure that the system you ultimately implement works the way you want it to,
avoiding costly modifications in the future.
But if youre going to invest in a data strategy, its worth it to spend a bit more and get a
great one. Why? Because a good data strategy will help ensure that the approach you
choose will work within your current business processes and grow with you in the
short-, medium-, and long-term. But a great strategy? Thats what will let you see what
is and is not possible, and ultimately help your data tell the stories that really matter to
your bottom line.
19
4. Data is Not Always a Substitute for Strategy by Steven Sinofsky
This same week Google announced the end of life of Google Reader, and as you can see
from the headline there was some controversy (it is noted with irony that the headline
points out that the twitter sphere is in a meltdown). For all the 50,000 or more folks
happy with the VM movie, it seems at least that many were unhappy about Google
reader
Controversy
The role of data in product development is not without controversy. In todays world
with abundant information from product development teams and analysis of that data,
there is ample room to debate and dissect choices. A few common arguments around
the use of data include:
20
Representation. No data can represent all people using (or who will use) a product. So
who was represented in the data?
Forward or backward looking. When looking at product usage, the data looks at how
the product was used but not how it will be used down the road (assuming changes). Is
the data justifying the choice or informing the choice?
Contextual. The data depends on context in which it is collected, so if the user interface
is sub-optimal or drives a certain pattern the data does not necessarily represent a valid
conclusion. Did the data consider that the collection was itself flawed?
Counter-intuitive. The data is viewed as counter-intuitive and does not follow the
conventional wisdom, so something must be wrong. How could the data overlook what
is obvious?
Causation or correlation. With data you can see an end state, but it is not always clear
what caused the end-state. If something is used a lot, crashes a lot, or is avoided there
might be many reasons, most not readily apparent or at least open to debate, that cause
that end-state. Is the end-state coincident with some variables or do those variables
cause the end-state?
When a product makes a seemingly unpopular change, such as Google did with reader,
some of more of these or other arguments are brought forward in the discussion of the
choice.
In the case of Reader, the official blog stated usage of Google Reader has declined.
While it does seem obvious that data informed the choice, if one does not agree with the
choice there is ample opportunity to dispute the conclusion. Was the usage in absolute
or relative decline? Were specific (presumably anonymous) users slowing their usage?
What about the usage in a particular customer segment? The counter-intuitive aspect of
the data showed, as most dialog pointed out strong first party usage among tech
enthusiasts and reporters.
21
The candid disclosure of the use of data offered some transparency to the choice, but
not complete transparency. Could more data be provided? Would that change how the
product development choice was received?
While these are just two examples, they happened in the same week and show the
challenges of data, transparency, and product development choices. While data can
inform choices, no one is saying it is the only way to make a choice or that those making
products should only defer to data. Product development is a complex mix of science
and intuition. Data represents part of that mix, but not the whole of it.
22
One part of developing a new product, as described, is to develop a minimum viable
product (MVP) that does not reflect the final product but is just enough of the product
to collect the maximum validated data about potential customers.
An interesting point in the description is how often the people that will use this early
version of the product are enthusiasts or those especially motivated and forgiving about
a product while under development. The tricky user interface, complex sign-up, or
missing error conditions and things like that might not matter to these folks, for
example.
Not every product ultimately targets those customersthey are not super large in
number relative to the broad consumer world, for example. As you learn and collect
validated data about your product strategy you will reach a critical point where you
essentially abandon or turn away from the focus on enthusiasts and tilt towards a
potentially larger customer base.
This is where your strategy comes into play. You have iterated and validated.
Presumably these early users of your product have started to use or like what you have
or at least pointed you in a direction. Then youll take a significant turn or change
maybe the business model will change, maybe there will be different features, maybe
the user interface will be different. This is all part of taking the learning and turning it
into a product and business. The data informed these choices, but you did not just
follow it blindly. Your product will reflect but not implement your MVP, usually.
But, with these choices there will probably not be universal agreement, because even
with the best validated data there can be different approaches to implementing the
learning.
Data transparency
The use of data is critical to modern product development. Every product of every kind
should have mechanisms in place to learn from how the product is used in the real
world (note, this is assuming very appropriate policies regarding the collection and
23
usage of this data). This is not just about initial development, but evolution and
maturing of the product as well.
If youre going to use data to design and develop your product, and also talk about how
the product was designed and developed, it is worth considering how you bring
transparency to the process. Too often, both within an organization and outside, data is
used conveniently to support or make a point. Why not consider how you could
provide some level of detail that could reduce the obvious ways those that disagree
with your approach might better understand things, especially for those that follow
along and care deeply. Some ideas:
Provide absolute numbers for the size of the data set to avoid discussions about
sample size.
Provide a sense of statistical significance across customer types (was the data
collected in one country, one type of device, etc.).
Provide the opportunity for additional follow up discussion or other queries
based on dialog.
Overlay the strategic or social science choices you are making in addition to the
data that informed the choices.
Transparency might not remove controversies but might be a useful tool to have an
open dialog with those that share your passion for the product.
Using data as part of product development will never be free of debate or even
controversy. Products today are used across millions of people worldwide in millions of
different ways, yet are often used through one software experience. What works in one
context doesnt always work in another. What works for one type of customer does not
always work for another. Yet those that make products need to ship and need to get
products to market.
As soft as software is, it is also enormously complex. The design, creation, and
maintenance preclude at todays state of the art an everything for everyone approach.
24
To make those choices a combination of data and intuition makes a good approach to
the social science of product development.
25
5. Understand How Your Company Creates Value by Cary Burch
One primary way I have seen the above done well, is through outlier detection.
Normally, we think of outliers as a bad thing. In analytics this is just not the case. Are
there customers using our products twice as long as the average? Are there those who
have purchased the same (durable) product repeatedly? Are there customers in areas of
the country, or globe, where we never intended to see penetration? These are data
points, not aberrations in your dataset. These are the exceptions to explore at a deeper
level, to find out why they love your product so much they will go out of their way to
26
get it, buy it more often than anticipated, use it longer than intended, or simply live in
areas where marketing never reached.
Finally, above all, better define your point(s) of differentiation. Hint, simply having a
different product feature, a unique company attribute, a winning culture, or a better
mission/purpose are not points of differentiation if your customer is not willing to pay
more to get it. If you are different, and your customer finds value in it, then youve
achieved differentiation. This is normally achieved by either offering something unique
to the market, or by serving a need through different means or in a different way than
competitors. Analytics can and should be used to identify what this meeting place is for
your company, between what separates you from the pack, and what of that your
customers find valuable.
The analysts, researchers, data scientists, and of course managers, of our business
analytics teams are not simply here to deliver captivating dashboards and glossy
reports on yoy sales and margin. Business analytics does not simply translate the sea of
data coming into the organization every day. Business analytics is also an expedition
party, a team of sociologists, detectives, and experimenters, who should be asking more
than simply how much did customers buy from us yesterday. Business analytics should
be challenged and empowered to do everything they can to instead ask why.
27
6. Act Like an Analyst, Think Like a Strategist by Dr. Justin Barclay
Experience has told me this has everything to do with the fact that organizations see
analytics as crucial, yet they lack a shared awareness of how analytics can be leveraged
within the business. Not unlike six sigma advocates who use the process solely for
manufacturing. Nor unlike balanced scorecard proponents who use the process to
simply assess, rather than coordinate and integrate key activities. You as an analytics
leader are the individual best positioned to help the business understand the totality of
what value analytics can deliver. Yet in order to do this, you will have to act like an
analyst, and think like a strategist.
To explore how best to identify an appropriate synthesis of analytics and strategy, the
following are a few key areas you will need to help your company understand are also
areas where analytics can help:
28
Unmet Demand. Analytics can help teams understand where demand is being met, and
where market share or share of wallet are being left behind for competitors. Cluster
analysis, basket analysis, and conjoint analysis can all help begin to answer this
question.
Market Segmentation. Analytics can help teams understand who is going to buy what
you offer. Understand the demographic/geographic/psychographic/behavioral
attributes of those you covet. Odds ratios, predictive models, heat maps, and lifetime
value analysis can provide color in this area.
The Value Chain. Here is where analytics is already being used, and can still be further
leveraged. Analytics can help teams to best understand the performance and linkage of
all key processes involved in creating value for customers. Dashboards and
performance reports are a great place to start.
The Consumption Chain. Customers do not just decide one day they will buy your
product or service. There are additional steps around need recognition, option
evaluation, even use/maintenance/disposal which need to be reviewed. Disrupting this
consumption chain to garner greater share through the use of surveys, focus groups,
and ethnography are key.
Differentiation. What sets your company apart from the field of neighboring players?
Analytics can contribute requisite market analysis, opportunity analysis, sensitivity
analysis, and both macro- and micro-environmental analysis ranging from PESTEL to
SWOT to help in this regard.
Core Strategy. Whether your company chooses a position of cost leadership, focus, or
differentiation just to name a few potential core strategies, analytics can help the
business to understand how it is performing in sustaining competitive advantage. From
historical performance analysis, to return on investment, market penetration, customer
satisfaction, and analysis on year over year accretion, analytics can help the business
with past, present, and future lenses.
29
While this only covers the highlights of what analytics can do for your company, this is
enough to spur the discussions necessary to bring analytics out of the operations or
sales department. It is a set of standard value opportunities which can give analytics the
holistic, integrative, strategic purview it needs to deliver true value to the business. All
it requires is your prophetic voice, backed by sound strategic thinking.
30
7. The Last Mile of Intuitive Decision Making by Vishal Kumar
31
2. Framed Bias:
When a leader is fed information, there are often the figments of framed bias.
Algorithms and models are susceptible to take the bias of the designers. The more
complicated a model is, the more it can learn and get influenced from its designers.
Such a model or methodology is often tainted with the framed bias. Take a simple
example of guessing a team size when options are 0-100, 100-500, 501-1000 viz a viz 0-2,
3-7, 8-12 etc. Both options (the span and volume of the values) when presented to a
subject, influences the outcome and induces the framed bias. As a leader it is important
to understand that the data analytics teams are not working under any framed bias.
4. Anchored Bias:
This is another interesting pitfall that taints the judgement and the reason could be our
own anchors or supplied by our teams. Anchors are often the bias that we strongly
believe in. They are the most visible picked assumption in the data or analysis that stick
with us and find its way into influencing subsequent judgement. One of the easiest
example of such a bias is any socio-political analysis. Such analysis is often anchored
with our pre-conceived bias/ information that closely satisfies our understanding. And
subsequently, we try to influence the outcome. In many ways there is a thin line of
differentiation between Anchored and framed bias, besides the point that framed is
influencing our judgement based on how something is framed viz-a-viz how any
process anchors as a basis for our decision process. The easiest way to move around
32
such bias is by keeping an open perspective and always staying within the bounds of
data or analysis.
5. Experience Bias:
This is one of the most painfully ignored bias that any leader has. We often think that
we are the best judge of our actions and we have most adequate knowledge of what we
do. Such a bias of being an expert in the field does come with an advantage that helps
handle a task with great speed and comfort. However, such bias tricks an individual in
believing that an experienced judgement is often the right judgement. A typical
example that I have come across is when a team is using obsolete models and
techniques without realizing the problem that there is something better out there. Such
a bias limits our capabilities and restricts our decisions to our limited understanding
about the subject. A leader must be swift in understanding such a bias and work around
it. One of the easiest way to work around such bias is by asking questions. Many times
such biases fade away when one questions their own knowledge and discovers cracks
and pitfalls in their decisions. This is a critical bias to fix for success of data analytics
teams.
33
7. Affinity towards new Bias:
Another bias that is engraved in people is our openness and friendliness towards the
new. New TV, new house, new gadget, new way to do think etc. Such a bias also
influences data analytics teams. The most prominent occurrence of such a bias is when a
new solution is proposed about doing an already adopted model and we give a green
light. Many times such decisions are tainted with the fact that we love to try new things
and many times forego analyzing the hidden and undiscovered drawbacks. As a leader
it is most important to be free from such a bias. One should have a clear scaled measure
for substitution. It is important to clearly state and understand SWAT analysis
difference between the new viz-a-viz old ways. The team should be clearly instructed
and trained to identify such pitfalls as many times this bias makes its way through
untrained or relatively newer analysts and often gets unnoticed. Keeping the team from
such a bias helps the team in siding with the best models for the job and helps improve
the quality of data and analytics.
34
So, it is important for a leader to keep their analysis and their teams practices bias free
so that the company could enjoy the benefits of bias free data driven outcomes.
I am certain, there are many other biases that are not covered here and I would love to
hear from the readers about those. Please feel free to connect and email your
suggestions and opinions.
35
8. Three BIG Reasons Companies are Failing with Big Data Analytics by Ian St.
Maurice
Big Data Analytics has become the
Ian St. Maurice new buzzword. It is all around us
Principal at A.T. Kearney - in business press, in LinkedIn
Marketing and Strategy discussion groups, etc. A lot of the
LinkedIn Contact talk is warranted: New marketing
tools are being created by the
I am a Principal in A.T. Kearney's Strategy, increased availability of data and
Marketing & Sales Practice based in Singapore. I advanced tools to process and
have more than 20 years of experience in marketing
leverage it. Brand new industries
consulting across industries and geographies. Prior
to joining A.T. Kearney, I spent over 15 years at are being born that monetize
McKinsey & Company in Singapore, Shanghai, exhaust data.
Hong Kong and New York and was also a
successful entrepreneur developing marketing
analytics software for hospitality chains. All of this has led to an explosion of
investment in big data. According to
I am a globally recognized expert on creating
breakthrough strategies to capture emerging market a recent survey by Gartner Inc. 73%
consumer opportunities based on a deep of companies have invested, or plan
understanding of their needs.
to invest in, big data in the next 24
My passion is helping my clients figure out how they months, up from 64% in 2013.
can integrate "Big" (from databases and social
media) and "Small" data (from quantitative and
Investment in the sector is growing
qualitative market research) to help them develop at an average annual rate of nearly
superior product and go-to-market strategies.
30% and is expected to reach $114
billion in 2018.
But here is one figure that is perhaps the most important one: Only 1 in 12 companies
that have started implementation of big data analytics are achieving the expected return
on investment.
This is shockingly in line with reported success rates of CRM system implementations
in early-mid 2000s, 70% to 90% of which failed according to Brewton.
36
So whats happening? Why do we seem to have not learned our lesson from CRM
implementation issues 10-15 years ago?
Part of it is because hype is driving investment ahead of understanding. But there are
other barriers that are keeping companies from maximizing their returns on the big data
trends. The three big ones are:
1. Data management
Organizations are struggling with capturing, storing and running analyses on the
massive amount of exhaust data produced by their businesses or the data they can
access from external sources like market research firms. The amount of data produced
today by most businesses far outstrips companies infrastructure to store it.
In many cases, some of the most valuable data - like historical responses data (both
positive and negative response) - simply does not get collected. This is crucial to
develop insights on what kinds of offers are more or less compelling to the market.
This is further complicated by the fact that a lot of the data in warehouses today tend to
have serious quality issues associated with them.
37
Data complexity and interpretability is also holding companies back. Data warehouses
need to account for the thousands (in some cases millions) of different ways a customer
behaves, or a transaction occurs. This can yield an enormous amount of codes/syntax
that must be unraveled to get to real insights that business can act on. If these data were
captured in legacy systems often the interpretation of what different fields and codes or
combinations thereof can get lost as many organizations do a poor job of capturing
everything religiously in their data dictionaries.
2. Organizational silos
The power of big data is its ability to show how different behaviors and trends are
related. By getting a 360-degree view of customers across ALL products and services
they use and across ALL channels, we can get much better insights into their true
motivations and as a result unlock more value.
To do this efficiently, data should flow freely between silos in the organization.
Unfortunately, this rarely happens. Customer IDs (used to integrate across product
groups) sometimes are not uniform across Business Units (BUs) or systems. Sometimes
BUs treat their data very possessively and seek to restrict or delay access or require
extensive paperwork until they let other BUs leverage it.
Restricting access and movement of data limits the ability of firms to capture value with
analytics much in the same way that restricting flows of capital leads to less efficient use
of that capital and lower returns.
3. Capabilities
Big data professionals are hot commodity in the global talent market. Recent reports
indicate that 2 million people will be employed in big data positions by 2015 in the US
alone.
However, multiple surveys of C-suite executives highlight that getting the right talent is
their biggest challenge in getting their big data programs off the ground.
What Ive seen many organizations do is to focus on 3 different profiles for leadership
positions in analytics:
38
1. Top notch statistician
2. Experienced IT professional
3. Experienced line manager
Each of these brings necessary skills to the table: Ability to drive creation of advanced
propensity models (statisticians); integration of insights into operational systems (IT
professionals); creating pragmatic solutions, which will be adopted by line organization
and drive impact (line managers).
But what organizations really need are Big Data Trilinguals who have exposure and
some experience with all three disciplines and can effectively translate insights and
programs. These profiles are exceedingly rare.
So while the buzz around big data continues to grow louder, it will be a while before
companies can fully exploit its potential. In my next post, I will look at what firms can
do to achieve that.
39
9. How to Leverage Power of Big Data Analytics in Making Strategic Decisions
by Ankush Chopra
Big data analytics provides trends across different types of data or correlations among
data points hidden deep within a data repository; I term such insights as empirical
insights. Empirical insights are often indicators of a micro behavior that may have
major strategic impact but to understand that larger impact one needs another type of
insight which I term conceptual insight. Conceptual insights allow you to model the
40
world in abstract concepts and see the relation between different parts of the world. For
example, lower the competition in an industry, easier it is to enter the industry or its
corollary lack of competition makes new entry easier is a conceptual insight developed
with observation across data points. In order to make a leap from empirical insight to a
conceptual insight, one needs conceptual frameworks to perceive the world.
To understand this better, let us use the example of blades and razor industry where
Dollar Shave Club has made a sizeable dent. Big data repository with tweet data may
show that some people are unhappy with the price of shaving blades and blog comment
data may provide information on potential avenues for cheaper blades. However, these
empirical insights dont automatically enable you to make a leap that a Dollar Shave
Club may be emerging and may be successful. To make that leap you need to leverage a
conceptual insight by looking at the strategic position of various players in the industry.
Once you notice that low price - low performance segment in the market has been
vacant, you can marry the two to see how Dollar Shave Club could be a disruptive force
in the industry. In this way, conceptual insights have a multiplier effect on the empirical
insights of big data analytics.
The example above points to a larger issue of disruptive forces managers face today
where big data can make a powerful impact. It would be fair to say that we are living in
the age of disruption; today no industry is immune from disruptive forces. Not only are
disruptive forces emerging across industries, managers have little time to react to them.
Although it took 30 years for digital cameras to disrupt analog cameras, it took the
iPhone less than 5 years to displace blackberry. Uber and AirBnB came out of nowhere
to play havoc with taxi and hotel companies respectively. And finally, as mentioned
above, Dollar shave club emerged fairly rapidly to make a dent in the blades and razor
industry. In short, managers face powerful disruptive shocks today but have virtually
no time to react. This observation made me write The dark side of innovation which I
hope will help managers not only think about disruption but also use it to deal with
disruptive forces before they emerge in an industry.
41
In my book The Dark Side of Innovation I introduced a conceptual framework to deal
with potentially disruptive events in any industry. The conceptual framework is a three
step process to manage disruptive forces in any industry. These steps are:
This framework, based on extensive research and data from across many industries,
provides a pathway to conceptual insights. Using the conceptual tools in the book, you
can see your business in a new light and generate conceptual insights. When results
from big data analytics is combined with this framework, it can provide powerful
insights to help you deal with disruptive events in your industry.
In fact, you can appropriately marry big data insights with each step of the framework
above. In the first step, you can leverage the empirical insights from your big data to
feed into the models used for predicting disruption in an industry. Just as in the
example of Dollar Shave club above, you can use multiple empirical insights along with
the conceptual insights from the tools in the book to predict disruptive events in your
industry. Similarly, in the second step, where imaging the future states of your industry
is concerned; you can use big data analytics to predict future scenarios in your industry
to see if your organization is ready for those future scenarios. Finally, in the third step
where you need to device solutions to be prepared for the future scenarios, big data
analytics can be a powerful tool for you. This is where you will see high impact of
changing consumer behaviors, data driven insights for product development, superior
customer segmentation and high impact marketing actions. In short, this framework
example shows how you can leverage big data insights by using conceptual insights as
a multiplier force to deal with major strategic decisions.
In conclusion, big data analytics is a powerful force that can aid you in many ways
when dealing with strategic as well as tactical issues. However, to reap the real power
of big data analytics you need to marry the empirical insights from big data analytics
with the conceptual insights.
42
10. How to Hire the Best Analytics Leaders: And its not the way you think! by
Tony Branda
Analytics is to Management
Tony Branda Decision Making What Meta-
CEO at physics is to spirituality. They
CustomerIntelligence.net
both provide laws of thinking and
LinkedIn Contact procedures, methods and
treatments to shine the light on the
truth. The Truth generated from
Author is CEO Tony Branda, a top Analytics and
CRM Executive with 25 years of practitioner- analytics should help executives
experience leading large CRM and and government officials steer
Analytics Centers of Excellence. Mr. Branda was
clear of bad decisions and increase
most recently the Chief Analytics Officer for Citibank the value Senior Management
North American Retail. Mr.
brings to the firm or
Branda owns CustomerIntelligence.net and is organization. More Value is
Professor of Integrated Marketing at NYU and Pace created when executives choose to
Universities. Mr Branda is
embrace these tools and resultant
pursuing his Doctorate in Business Administration in facts and data are leveraged to
Marketing Analytics and holds an MBA from Pace
inform decisions and
University as well as a
judgements. Tony Branda,
Certificate in Integrated Marketing from NYU. Analytics expert. 1/17/2016.
Introduction:
Most firms today are using very dated behavioral interview techniques (1980s style
questions from when before analytics and digital existed) to understand who might be
the best leader to build, manage or restructure their analytics functions. The purpose
of this article is to point out that extant techniques may lead Senior Executives to assess
and hire the wrong analytics leader. This how to guide may become the best friend
of the hiring manager as it will assist in creating a better hiring outcome.
43
to probe on the main leadership competencies required in analytics such as: the
ability to motivate highly quantitative talent, mastery of the multi-disciplinary
nature of analytics, and the ability to connect insights to strategy.
44
Common Mistake Number Three:
3. Assume that business leaders with reporting or metrics backgrounds particularly
in sales/marketing are truly Analytics or Intelligence Leaders. Analytics leaders
typically have done rotations in Analytics, Decision Sciences, Research, CRM, Big Data
Strategy, Database or Digital Marketing and represent truly multi-disciplinary
backgrounds.
Cross-over hires can be appropriate at more junior analytics levels. At the most senior
level, this can be problematic, especially when Analytics is expected to drive strategy
and regulations for data usage are complex. The regulatory environment of the past 7
years has forever changed how data can be leveraged especially in Financial Services
45
(Cards, Insurance and Banking) for customer targeting and risk management. Knowing
the complexities of the data enables the Analytics Leader to abide by the regulations
and also to be able to mine what is permissible for opportunities to grow the
business. So know that if you are really keen on hiring from outside your industry
there is a cultural/fit risk. Also one question to ask the candidates is for the speed or
velocity at which projects and changes happen at the company they are at now versus
your company. The way a tech company gets things done may be very different than a
book publisher or a bank for example.
46
are about looking for opportunities and not only for cost saves. The investments
executives make in analytics can be returned ten-fold so we recommend that the
Analytics leaders not report to functions that are only support or cost containment type
of roles such as the CTO, Chiefs of Staff or other Chief Administrative functions (which
tend to be more support or shared services). Analytics leaders and functions need to be
where they can best inform strategy and drive growth and competitive advantage
(While we acknowledge that there are lots of different ways to organize we suggest this
as a barometer.)
In closing, Analytics is an evolving field and is finally coming into its own right within
organizations, therefore hiring an analytics leader requires special care and attention
and requires going beyond general management behavioral interviewing in favor of a
more robust integrated approach. It is vitally important for the hiring manager to make
sure that HR, recruiters and all team members involved in the hiring decision
understand the phenomenon discussed in this article. Analytics leaders should be
assessed on leadership dimensions, subject matter experience, the rotations they have
done and their industry contacts and knowledge. This articles Point of View hopes to
open up further debate on the best way to hire the analytics executive while
maintaining that the traditional way for this newer, more technology and knowledge
driven field may no longer produce the best hiring outcome.
47
11. Big Decisions Will Supplant Big Data by David Kenny
That said, many leaders know their company or government could move much faster
and more boldly when it comes to converting data and analysis into decisions and
action. Unfortunately, the rise of Big Data has not eliminated analysis paralysis. In
fact, in some ways it has actually even enabled indecisiveness. Because as more data has
become available, the easier it has become to get lost in details, and the more tempting it
has become to want the data to make the decisions for us.
48
Our ability to turn Big Data into big decisions is more critical than ever. Thats
because we have only begun to realize our full potential when it comes to using Big
Data to prompt the big decisions that will help us solve the most daunting set of
problems our species has ever faced. Those problems include the massive, highly
complicated issues like managing climate change, overcoming scarcity water and food,
and providing accessible health care for more than 7 billion people. They also include
smaller, but frustrating, issues like the unnecessary loss of human life because of simple
things like texting-while-driving or the lack of flu immunizations.
How do we move from being excited about Big Data to realizing data is just a means to
the ends of big decisions?
In their book Analytics at Work, Thomas Harris, Jeanne Davenport and Robert
Morrison point to leadership as the single most important factor in a companys ability
to turn data into good decisions. They go one to say that: The best decision-makers will
be those who combine the science of quantitative analysis with the art of sound
reasoning." I agree, and predict that 2015 will be a year when we move from Big Data to
Big Decisions by accelerating sound reasoning to match the growth in quantitative
analysis. These changes will happen on three levels:
1. Big individual decisions. We have more individual data than ever before. Data about
how we are driving through in-car sensors. Data about our health from smart watches
and other wearable computers. Data about our cognitive abilities from daily tests on
our smartphones. Data about our finances from multiple apps. Data about the traffic
patterns on our commute. In 2015, more of these data sources will add reasoning to help
us make decisions and take action that improve our health, improve our safety, or
improve our financial choices. We are even evolving the weather forecast to become
more personalized to your life outdoors -- with forecasts for traveling, boating, surfing,
running, meal planning, hair care, and 100s of other use cases. We have learned that
49
turning data into decisions helps people make the most out of weather data, and I am
sure other data analyzers and forecasters will do the same.
2. Big company decisions. Companies have been investing heavily in Big Data. They
are tracking every SKU, every purchase, every ad impression, every referral, every hotel
stay, product efficacy, and so on. Companies have been correlating the behavior of their
customers with economic data, pricing data, weather data, news cycles, and every
potential factor that might drive a customer decision. The Powerpoint decks have
gotten thicker, but have the decisions really gotten better? In a few leading cases,
companies are getting more savvy about using the data to drive better and more precise
decisions. We see better decisions to serve the right content to each customer from
savvy recommendation engines of leading e-commerce and publishing companies. We
see better decisions from retail ad campaigns, synced with sophisticated supply chain
management, which vary by geography based on local weather and economic
conditions. We see better investment and risk decisions in some technical trading
models. But most companies are not yet using machine learning, not are they
automated decision-making with better algorithms. Instead, they are feeding more data
into the same human processes. The decisions are constrained by the human capacity to
process the data, so only a small part of the Big Data is actually used. In 2015, we should
see more companies rethink their decision processes to make Bigger and Better
Decisions using Big Data.
3. Big societal decisions. Government officials and large non-profit leaders need to
make decisions that affect society as a whole, in policy areas ranging from sustainable
energy to education and healthcare access to deployment of national and international
security efforts. To help make these decisions, governments have also long been the
biggest collectors of data -- satellite observations of the Earth. geological surveys of the
planer, economic and employment data, census data, transportation and traffic patterns,
and millions of other topics. This data is compiled, released in summary reports (often
weeks and months later), and used to help lawmakers and policymakers with Big
Decisions. But recently, there have been efforts in the USA and other key governments
to make the data accessible more rapidly and in more raw and granular fashion. And
this allows the data to be used by tax-paying citizens and businesses to improve their
50
own decisions. It also helps create entire new business models, like our weather
business, or Google and Microsoft's map business, or Bloomberg's economic indices.
When we free Big Data, process it through faster and lower cost computers, and turn it
into Big Decisions, we create an economic engine with more precision, more efficiency,
and more time left for innovation,
In the past few years at The Weather Company, weve talked a great deal about the
importance of becoming a distinctively great data company. In 2015, we will move
toward becoming a great decisions company, a company that provides our customers
not only with good information, but also doing so in a way that helps enable the people
we serve to make good, timely decisions based on short and long term weather
predictions.
That commitment has required us to become better decision-makers ourselves, and its
a capability that must become a more prevalent part of who we are and how we
operate. As we strive to do that, Im curious to hear what you are seeing your world?
How well are you turning Big Data into the Big Decisions that create the Big Ideas that
make the world a better place?
51
12. Big Data: The 'Bridge' to Success by Paul Forrest
You can't help but to have noticed that Big Data is growing in popularity for column
inches in the business press. Everywhere from the mainstream broadsheets right
through to regular articles in the Harvard Business Review one can learn of the latest
perspectives on Big Data and the likely relevance to your business. So why is it that we
arrive at the start of 2015 with many businesses struggling to make sense of the
opportunity and how to go about making the most of it?
Ok, so lets start with my regular health warning here These are my opinions and
observations. They are based on the work I perform at board level with many leading,
large listed businesses seeking to get the most from their insight and data enabled
initiatives. Whilst Im not the greatest fan of the Big Data expression, I use it here in
the same way people use bucket terms in business. It serves a purpose and is not
intended to differentiate the myriad initiatives that exist in this space.
Context
52
According to Gartner's recent Big Data Industry Insights report, it's clear that many
organizations are increasing their investments in Big Data. What is clearer still is that a
large proportion continues to struggle to gain significant business value from those
investments.
Add into the mix other sentiment and survey results such as a recent CSC survey and
you discover that five of every nine Big Data projects never reach completion and many
others fall short of their stated objectives. So what can these surveys offer in terms of
clues?
My own analysis here suggests no real surprises. Often, board level sponsors, business
managers, data and IT groups are not aligned on the business problem they need to
solve. Also, employees frequently lack the skills required to frame insight questions or
analyze data.
Many more simply don't take a strategic approach to their initiatives and fail to follow
what we have come to regard as cornerstone rules (more on these later). So, is there a
Silver Bullet available to help solve these issues?
53
Silver Bullet?
Many of us have come to understand this term as a general metaphor, where Silver
Bullet refers to any straightforward solution perceived to have extreme effectiveness.
The phrase typically appears with an expectation that some new development or
practice will easily cure a major prevailing problem.
Without wishing to be too controversial, my analysis of the missing link could yield
extreme effectiveness but not as a result of a new development.
So what are we discussing here? In its simplest terms, Im referring to a role best
described as The Bridge. The Bridge is a reference to a skillset more than a specific
role. It may even be a cultural characteristic but what is clear, is that for many
organizations failing to realize the benefits they seek from data, the root cause is the
disconnect between key stakeholders including the board, the IT department, those at
the sharp end of the business and those responsible for processing, storing and
analyzing the data available to the enterprise.
Now to be clear, this is not necessarily an independent role I see as being required in
isolation of other activities think of it more as a matrix management activity or a core
skill requirement for middle and senior management. In essence, if you could equip
your business analysts, data scientists, insight staff and IT team with these skills in a
coterminous fashion; I think youd be close to the desired state. But this requires
education; awareness and a heap of soft skills still often overlooked in corporate staff
and management development programs.
Buy or Build?
The perennial question of whether or not you develop your own or simply hire
someone with the requisite Bridge skills and characteristics is a challenging one for
many businesses. These are rare people. Managers at the front end of the business with
a deep enough appreciation for how data can answer previously unanswered questions,
IT practitioners with deep, relevant and up to date domain expertise, data practitioners
with the ability to translate their knowledge into easy to follow guidance for business
are not likely to be readily available.
54
Buying them in is equally challenging. As already noted, these are not necessarily
people with a role literally and exclusively relating to the tenets of this paper. Instead,
they are likely to be traits you have to specifically look for whilst recruiting other staff.
Perhaps the closest cousin is the Business Analyst. I prefer to consider these skills and
traits which all senior managers should be able to master to ensure that the expertise is
rooted in the business and available on tap. This means that unless you're in a hiring
cycle, you may need to consider building your own Bridge.
Reciprocal secondments
In-house and external Training and seminars
Enhanced/structured continuous professional development
Online communities
Individual reading
Business Safaris - Reading groups
Programmed action research
Individual action research
Membership of professional bodies
55
Whilst not intended to be an exhaustive list, there should be few surprises here. The
issue for many is that most soft skills training tends to be focused on developing deeper
skills utilized for the delivery of business as usual rather than developing a whole new
awareness of a data driven business. Perhaps its time to think outside of the box?
Ability to Communicate
The nature of The Bridge means that theyll be spending much of their time interacting
with people from a variety of backgrounds across the business. This ranges from users,
customers or clients, to management and teams of scientists, analysts and IT
developers. The ultimate success of an initiative depends on the details being clearly
communicated and understood between all parties, especially the project requirements,
any requested changes and results from the testing.
Technical Skills
For your Bridge to be able to identify solutions for the business, they need to know
and understand the data available to the business, its current use, storage, ownership
and currency. Remember, youre not expecting people here to be able to craft their own
solutions and develop deep data taxonomies etc. Bringing bridge skills into play
necessitates understanding new outcomes that can be achieved with the data worked in
a different way answering previously unanswered questions. This in turn means
having a good understanding of the benefits the latest analytical approaches and the
insight, analytics and scientific skills available to or within the business offers.
Ultimately, you will need a data scientist, analyst or insight specialist to make the
magic happen.
56
The ability to test data solutions and hypothesis and design business what if scenarios is
an important technical skill, and theyll only gain respect and build confidence in the
Data Science teams and business end-users if they can demonstrate that they can speak
with authority in the dual languages of business and data, whilst being technically
strong in the appreciation of data tools.
Analytical Skills
To be able to properly interpret and translate the business needs into data and
operational requirements, every Bridge needs very strong analytical skills. A
significant amount of their job will be analyzing data, documents and scenarios,
reviewing user journeys and operational workflow to determine the right courses of
action to take.
Decision-Making Skills
It is key in any business that the board or key decision makers do not abdicate
management responsibility. However, your Bridge should be available to help make
important decisions in the data solution building process. The Bridge acts as a
consultant to managers and the advisor for the data and IT teams together, so they need
to be ready to be asked any number of questions. Your Bridge needs to know how to
assess a situation, take in relevant information from your stakeholders and work
collaboratively to plan out a course of action.
57
The Bridge is the liaison between the users, data scientists, analysts, developers,
management and the clients. This means a careful balancing act of differing interests
whilst trying to achieve the best outcome for the business. To do so requires an
outstanding ability to persuade and negotiate.
Agility
Corporate agility is a trait for an organization wishing to get the best out of a Bridge.
However, individual agility is a mandatory requirement. Agile and flexible, and have
no trouble taking on the unique challenges of every new business project with a rich
data driven theme and mastering both the requirements and the personalities in the
collaborating teams.
Conclusion
For many businesses, Big Data initiatives are not failing per se, it is simply that the
organizations facing failure have yet to find an implementation model that allows them
to exploit the initiatives to deliver expected outcomes. This is a matter of Business
Maturity and assembling the right team (see the rules below). In many cases where
failure prevails, it is the absence of the right skills that has caused the failure. To help
overcome these hurdles and maximize business value from data, organizations should
seek out those in their ranks with the skills to be the Bridge and then consider the
following steps to significantly shorten the time-to-value and contribute to business
success using big data initiatives.
58
My advice? Follow these simple rules:
Maintain one instance of each data set not multiple copies (thus avoiding version
control issues and problems with validity or currency of data)
Find ways to focus on faster deployment of data the faster the more valuable
the outcome particularly when it comes to predicative analysis
Consider diverse data sources including social and collaborative data sourcing
from peers, competitors and data exchanges.
Data has value not always immediately, so keep what you can and focus on
exponential growth of your data storage needs
Use your Bridge to work with the business to identify and plan to solve real
pain points. Identifying the unmet need is key and interpretation of the pain into
tenets of a solution is where your Bridge should really shine. They can then
help specify solutions for the data scientists and analysts to build with the IT
team.
Ultimately, its about bringing the right people together at the right time and
facilitating their buy in and commitment to business solutions to complex
business problems. Get it right and maximize the prospects of your Big Data
initiatives.
59
13. What You Need to Know About Big Data by Dorie Clark
Its Not Just for Large Companies. Companies like Amazon, Apple, Facebook,
Google, Twitter, and others would not be nearly as effective without Big Data, says
Simon. But its not exclusively the province of billion-dollar tastemakers. In the book, I
write about a few small companies that have taken advantage of Big Data. Quantcast is
one of them. Theres no shortage of myths around Big Data, and one of the most
pernicious is that an organization needs thousands of employees and billions in
revenue to take advantage of it. Simply not true.
60
Data Visualization may be the Next Big Thing. You have massive amounts of data
how can you possibly comprehend it? Enter data visualization, which Simon believes is
absolutely essential for organizations[These tools] are helping employees make
sense of the never-ending stream of data hitting them faster than ever. Our brains
respond much better to visuals than rows on a spreadsheet. This goes way beyond
Excel charts, graphs, or even pivot tables. Companies like Tableau Software have
allowed non-technical users to create very interactive and imaginative ways to visually
represent information.
Intuition isnt Dead. It turns out Big Data and the certainty of numbers hasnt killed
intuition, after all. Contrary to what some people believe, says Simon, intuition is as
important as ever. When looking at massive, unprecedented datasets, you need some
place to start. Intuition is more important than ever precisely because theres so much
data now. We are entering an era in which more and more things can be tested. Big
Data has not, at least not yet, replaced intuition; the latter merely complements the
former. The relationship between the two is a continuum, not a binary.
It Isnt a Panacea. When used right, says Simon, Big Data can reduce uncertainty,
not eliminate it. We can know more about previously unknowable things. This helps us
61
make better predictions and better business decisions. But it doesnt mean we can rely
solely on the numbers without injecting our knowledge, perspective, and (yes)
intuition.
Is your company using Big Data? How do you expect it to impact your business or
industry?
62
14. Change Your Business One Metric at a Time by Bill Franks
There are various interpretations of the lessons that can be learned from this story. But
for today well focus on one specific lesson. Namely, to win, you need to keep your eye
on the finish line and make steady and relentless progress towards it. If you do, youll
get there. In the process, you might just pass by others who thought they had a faster
approach but who got sidetracked on the way to success.
63
The goal in the world of business is often to reach your goal with high confidence and
within a committed time frame. Transformation efforts often arent as much a desperate
sprint to the finish as they are a longer race where pacing yourself is wise. Taking the
tortoises approach just might be the winning approach in the face of cultural resistance.
My client could have taken the faster, hare-like approach and created an entire new
suite of monthly summaries right from the start. After all, we had all of the analytics
completed and the data was available. However, he had a goal to get the executives to
change over the fiscal year, which gave him some time. He was concerned that
throwing everything at them at once would overwhelm them and cause a lot of
pushback. So, he took a different approach.
The first month, he added one simple customer-oriented metric to his presentation. He
took the time to explain it and discuss why it was relevant. It was a digestible change
and the executives accepted it. He left it the same way for one additional month and
ensured that the executives remained comfortable with it. Then, he added a second
customer-oriented metric to his presentation. Building on what they already knew, it
too was easy to digest. Over the course of the year, he continued to add more customer
views into his monthly presentation, including complex and nuanced metrics.
By the end of the fiscal year, the VP did successfully migrate the executive team to
understanding and embracing a customer view of the business. The monthly
presentation contained a fair number of new metrics and analytics and he was able to
get the executives to begin to look at their business differently. At the end of the year,
64
he felt that he wouldnt have succeeded any faster if he had pushed harder. In fact, he
wasnt sure he would have succeeded at all.
The take away Id leave with you is that as you try to change your organization through
analytics, be sure not to push too hard like the hare when youre going against the
organizations entrenched culture. You might actually do better if you take the tortoises
approach and create a solid, steady plan to get to the finish line. At times, youll
certainly wish you could move things faster. However, you must start from a realistic
perspective and accept slowing down if in fact slowing down will get you where you
need to go sooner.
65
15. The Importance of Analytics and Data Science by James Barrese
Ultimately, every company needs to connect how they use data to improving core
customer value propositions. In this competitive landscape, teams need to quickly
identify opportunities or problems, find the right data and algorithms that drive benefit,
scale the results and bring a solution to market quickly. If a company doesnt use data
66
well, there is real risk that competitors who are more facile with data will outpace and
innovate faster. From weeding through and analyzing data, to predicting future
consumer behavior, analytics teams are constantly challenged to deliver relevant data
as efficiently as possible in ways that directly improve the business.
What a lot of people dont realize is that data science is about having both hard and soft
skills. We often describe this as having an open mindset to embrace data and algorithms
while also remaining flexible and open to new ideas. Weve found the best data
scientists go beyond university degrees and trending technical skills; they also have
deep business acumen, a steady pulse on the latest happenings in the industry, and
keen instincts that enable them to anticipate where and how to look at the data to find
important insights. It boils down to having a true passion for problem-solving. Relating
technology know-how to business goals and working well alongside people with
different points of view are must-haves for the future of data science.
A fundamental yet critical part of the approach is having the technical skills to
overcome obstacles and being able to process data regardless of the format or size, and
doing so while simultaneously looking at the bigger picture to eliminate noise while
identifying and connecting the data to broader trends. Relying too heavily on just the
technical aspects can actually hinder the process in fact, weve seen many technical-
only experts get left behind. Traditional analytical methodologies alone are not
sufficient to keep up with the massive growth of data volume and variety. There are
many variables that need to be considered, and sometimes you have to go with your
instincts.
The pace of innovation is only going to increase, and we can expect technology to
deliver more data than ever before. How we use it is where the importance of data
science comes in when done right, it can be a key competitive differentiator and play a
strategic role in driving a business forward.
67
SECTION B: BEST PRACTICES AND HACKS AROUND DATA
ANALYTICS
It is important for todays data scientists to work optimally to avoid under or over
analyzing the data for insights. Every data scientist and leader uses their tricks and
rules of thumb to help make them successful. While working with The Big Analytics
contributing authors, we came across several writers with great hacks and best practices
that they use in their daily data analytics chores. This section shines light on such hacks,
shortcuts and best practices.
The best way to approach this section is to read each article in isolation. Before reading
each article, we suggest that you first read the authors biography. Knowing the
authors background can give you the right context to understand their hacks or best
practices.
68
1. What is a Data Scientist? by Kurt Cagle
In the 1960s, computer scientists were (mainly) guys who wore white button shirts with
pocket protectors to keep the ink pens they carried from staining them, usually had
glasses, and typically looked both intense and harried, in a nerdish sort of way. More to
the point, their primary task was a combination of writing code and keeping the
machines that ran that code from doing things like walking away. (If you hit exactly the
right frequency with certain disk access, you could get a disk drive in 1965, something
almost exactly the size of a washing machine, to vibrate so hard as to "walk" away from
its mount, very much like an unbalanced washing machine - perhaps because a disk
drive was literally nothing more than a spinning drum that held data embedded in
69
magnetic strips along its surface. If you didn't mind ruining a several million-dollar
piece of equipment, you probably could dry your clothes in it.)
These early programmers generally were fairly unconcerned with the analysis of the
data that they produced. For starters, it was expensive getting enough data for any such
analysis to matter, and while you could do some basic analysis work with these
computers, the data
scientists of the time were still working out the algorithms necessary to even begin to do
anything meaningful.
Programmers tended to dominate for the next forty+ years. Programmers made
computers do things. Data was what was read in and what was spit out, but the real
work was in writing the procedures and (later) applications that consumed or produced
that data.
About two decades ago, that began to change. Spreadsheets had brought the idea of
being able to run simulations to business users, even as tools for bringing higher end
mathematical capabilities and similar simulation environments meant that financial
analysts, researchers, engineers, industrial designers and the like were able to harness
70
the tools of the computer revolution to make their work easier. Business Intelligence
products began to emerge, oriented initially on internal systems within companies, but
shifting to analyze the external business environment as capabilities increased.
Ironically, there's a number of factors that make me believe that the era of the
programmer is ending. That's not to say that there won't be a need for programmers in
ten years time (there will always be a need for programmers), only that growth in
programming as an overall percentage of IT jobs will likely be declining. One factor is
the fact that we are swimming in applications right now. You want an office suite? If
you don't want to pay Microsoft, then download Libre Office, or use Google Docs, or
Zimio, or run an app for your iPad or Chrome Book. Need a database? Relational or
NoSQL? You want it in the cloud? You want it in purple? Need a website? Oh, come on,
do you even need a website anymore?
Do you know what you call a startup nowadays? A guy in his underwear who writes
apps for phones in his parent's spare bedroom. Every so often the lumbering giants of
software's Paleolithic era while buy up one or two of these for silly amounts of money,
but even they are recognizing that this is primarily for tax avoidance purposes. The
dinosaurs will become ever less relevant, because most of the profitable problems have
already been solved (not necessarily the most urgent or needy of problems, only those
that can be solved for a goodly chunk of change).
What's taking their place are data scientists. Contrary to what HR people may believe.
You do not need to be an expert in writing Hadoop solutions to be a data scientist,
though gaining proficiency in Hortonworks or Cloudera won't hurt. A data scientist is
not, in fact a programmer - or at least not primarily a programmer. Instead, a data
scientist is an analyst.
Now, I've thrown this term around a lot in recent posts, but its worth actually digging
into what that term means. An analyst is a form of detective, though they don't (well,
most don't) deal with divorce cases. An analysts role is to take disparate pieces of
information from a broad number of sources and synthesize them into a set of
conclusions about a particular system or domain.
71
This particular role has been around for a while now, but what has changed is the sheer
power and variety of tools that such analysts have at their disposal - and the skills
needed to make use of those tools. You have tools for searching through large amounts
of data looking for patterns, tools for establishing relationships between these disparate
data, tools for determining the stochastic (i.e., statistical) characteristics of both
populations and samples within those populations, and tools for visualizing this
information.
This means that the real skills that are needed in this area are not in building tools, but
in using them and understanding the significance of what they uncover. It is in
identifying patterns and anti-patterns, and using these to ferret out criminal or
fraudulent activity (sadly, these two are not always synonymous these days), to identify
companies that may be about to emerge or determine banks that are on the edge of
insolvency, determine the optimal blade configuration for a windmill or a solenoid in
an electric car, predict elections or identify demographic constituencies, determine
whether a TV show is likely to be a runaway sensation or a flop, or whether a drug will
get past clinical trials without having to invest billions of dollars in research.
To do this, this new breed of scientist/analyst needs to understand their domain, but
also needs to understand the information tools that those domains require. Nate Silvers
uses a statistical package for his 538 blog that's not all that dissimilar from such tools as
R or Matlab, open source and commercial versions of tools for doing data analytics. Yet
there is no Predict Elections button in his software, because the real job of the data
scientist is to create a model of a particular domain or environment then, by using these
tools, simulate the potential outcomes, their pathways and their probabilities of
occurrence. This process of "data modeling" is fundamental, because without an
actionable model, no analysis can happen. if someone uses a map/reduce tool like
Hadoop to trawl through data forms, he or she will still need to establish a model
indicating both what the expected input data looks like as well as establishing how this
information gets passed on to other processes.
This model in turn creates the boundaries for the simulations that the model is expected
to run in. A data scientist builds scenarios (and models these using algorithms) and
72
then tests this information against known and real data. In the end, this makes it
possible to create meaningful conclusions about the data and in turn to make the
assemblage of model and data a tool for prediction in future scenarios.
The term data scientist, in this regard, is actually quite accurate, because science itself is
predicated upon creating a hypothesis, building a model, testing that model first
through legacy "historical" data and seeing whether the hypothesis is consistent with
the data both when results are known and when they arent. The model is then refined,
surfacing "hidden variables" (factors that affect a given scenario that aren't immediately
obvious) and then using this to better understand the mechanisms that cause the
expected results.
Thus, data scientists are in general scientists or domain specialists first, and then are
essentially super-users - people who use increasingly sophisticated environments for
analysis that require a deep understanding of the data and tools that they are working
with, but who are generally not writing applications.
Indeed, my expectation is that this new set of data scientists are the avant garde of the
next generation of "IT", and where most technically oriented jobs will end up being
within the next decade or so.
73
2. Sentiment Analysis Meets Active Collaboration by Julie Wittes Schlack
Interestingly, within a year or two of its publication, allegations surfaced that this
picture had been faked a question thats debated until this day. But if that allegation
had been proven if it was staged -- would people have donated money and clothes,
and volunteered to fight alongside the Spanish soldiers? Probably not, because emotion
and the action arising from it must still be credible and founded on fact, to be powerful.
While photos of someone buying toothpaste or shopping for a car are inherently less
powerful than pictures of death and starvation, the underlying principle here holds. In
the business world, in the political world everywhere that people are engaged in
listening to conversation on a mass scale and trying to learn from it were doing it
74
with the ultimate goal of not just gathering facts , but of moving people, of changing
peoples minds and behaviors. Thats what marketing is.
So why am I talking about this in a book about Big Data? Because the risk that we
always run even with the best of intentions is that when we turn actions and words
into data, we might lose whats shifting, whats subjective, and whats human behind
the numbers and visualizations. And ultimately, data alone doesnt move people.
People move people.
We tried to understand why, both by mining sentiment using a social media mining
and listening application, and by directly asking people in our online community. The
social media listening tool certainly surfaced spikes in negative sentiment, and let us do
75
a pretty good job of filtering down the retrieved posts to those from individual
travelers. Alas, when drilling down on those, wed often discover either retweets of
offers or promotions from owned content providers, or lengthy blogs covering all
aspects of travel. While there was undoubtedly some cranky comment about our
clients baggage policy in these dissertations, the effort of finding it, multiplied a
hundredfold, was far greater than simply, directly asking some fliers how they feel.
And when we did the latter, we got responses that were focused, detailed, and rich,
comments like these:
It really is a shame that <airline> is becoming a low cost carrier. I avoid those carriers due to all
those stupid rules. Now <airline> is doing the same. So indeed everyone will carry everything on
board and <airline> will soon discover this situation is unworkable. Therefore you will make new
rules for hand luggage as well and we need to put our coat, umbrella and handbag also in 1
carry-on bag as elite members?
The new baggage policy is not fair. You say "Our most valued passengers" and put all FB-
members in the same category something is wrong. You give me (100,000 miles in 2012) the
same benefits as my 5 year old son with a basic FB card. 2 free bags for elite plus, 1 free bag for
others please.
To further illustrate this point, consider the case of some work we did for a major
apparel retailer who wanted to improve the bra shopping experience. Social media
76
mining unearthed a wealth of content regarding how to find the right fit, how to get
through metal detectors wearing underwire bras, and where good values were to be
found. While it was all interesting, only a subset of it was really useful to our client in
relation to their specific objective. At the same time, we did a couple of virtual shop-
alongs, having two women share text and images shot with their phones in real time as
they browsed the bra aisle and eavesdropped on the sales associates conversation.
This sample of two active, engaged, and knowing consumer partners in research was
every bit as valuable and actionable as the reams of harvested verbatim. Signage, how
merchandise is arranged, the attentiveness of sales staff we learned about it all in a
few short minutes of active collaboration.
When we played with the display a little, a really interesting pattern emerged. When I
dragged turkey to the top right portion of the screen, all of the related terms went
with it. This created a cluster of food-related terms in the top right portion of the screen,
and another cluster of shopping-related terms coupons and bags and shop and
checkout in the bottom portion of the screen.
77
Even before we began looking for emotional language like happy or delicious or
excited, what became immediately apparent is that when people were talking about
our clients brands, it was in very process-oriented, transactional terms. They were
talking about coupons and bags. But when they talked about food, it was in relation to
our clients competitors.
So again, here a text analytics tool was invaluable in surfacing a potential problem. But
solving it developing a shopping experience where consumers would be as excited
about the products as the deals that requires ongoing, knowing collaboration. So the
deep connection here is between the brand and the shoppers willing to invest their
time in energy in helping the store.
But deep connections also refers to going deep into the human psyche. We know that
much human behavior is informed by emotional responses and symbolic associations.
For instance, we did a study on the Sandwich Generation people who are caring both
for elderly parents and dependent children. Health care providers, insurers, financial
service providers, transportation companies, and others are all keenly interested in
understanding the emotional conflicts and even the language used by this growing
population. In it, we simply asked What kind of sandwich are you, and got verbatim
like the following:
I feel like the kind of sandwich I despise eating. Mostly tasteless bread, impossible to capture a
bite out of without spilling the contents all over ones face and chest to indelibly stain a favorite
piece of clothing. (Yeah, stuffs been tough lately.)
I am all burnt out. Cheese melting all around me and I am the ham in the middle.
A human reader makes instant sense of these responses. But absent any context, I
suspect that most text analytics tools would be totally stumped and group these
responses with the grocery data we looked at earlier, simply because metaphor is
challenging for even the most sophisticated automated tool. But metaphor is how we
tell stories. So even though big data tools and analytics let us go wide, thats not always
78
whats needed. Sometime we just need to engage with a handful of people with whom
we can go deep.
Heres an example of how that plays out. A client of ours, Charles Schwab, knew that
they had a problem with women. In fact, recent study had shown that an estimated 70%
of women fire their male advisers once their husband dies, and 90% of those hire a
female adviser afterwards. So general client retention data told them they had a
problem. So did survey data and sentiment analytics, which pointed to womens
experience with brokers feeling much like their experience with car salesmen high-
pressured and carnivorous.
The point here is that once again, while data played a valuable role in identifying a
trend, analytics no matter how good cant generate a solution. That takes conscious,
collaborative, concentrated effort iterative work that you design and build on over
time -- and requires only a relatively small group of people.
Sentiment is Contextual
Speaking whether orally or in text is essentially a public act. Its a behavior that
lends itself to be measured, and that makes it inherently biased, or at least incomplete.
So when doing any sort of sentiment analysis, you have to consider the source of the
content youre analyzing.
79
We know from our own research and that performed by many other firms that people
are clamping down on their privacy settings, not only posting less highly personal
content, but making that content to fewer people and text mining engines. There are
feelings, experiences, sights, sounds, and routines that provide invaluable insight to
brands, but that people do not want to share with or in front of their friends.
As importantly, even if people did feel comfortable sharing this online in their social
networks, these are not the types of things that they are spontaneously likely to post
about, but that they will share if asked and especially if they are in an intentional and
trusting partnership with the companies who are asking.
Beyond the privacy factor, though, is another, even more important fact, and that is that
peoples identity is fluid and highly contextual. Im an executive, a mother, a daughter,
a wife, a sister, a professional, and, as anyone whos heard me play saxophone can
testify, an amateur. Which of those identities is dominant depends on my mood, who
Im with, where I am, and what Im doing.
So in deriving sentiment from mined content of any sort, you have to consider not only
how public or private a setting its coming from, but the context relative to the speakers.
Regardless of whether or not they spoke Spanish or English in the home, took our
survey in Spanish or English or considered themselves to be Latino, Hispanic or
Nicaraguan, people tended to experience their ethnic identity differently in different
contexts. So even if you can run sentiment analysis on postings made by people who are
known to be Hispanic, absent any direct interaction with these individuals, it can still be
difficult to know what correlations to draw between what Hispanics say in one context
versus another.
80
Data plus Meaning
One of our clients wanted to understand what apps people were using when, and
where. So using a mobile metering app, we captured data from about 35 people who
agreed to let us track what they were doing on their mobile phones for about a month.
The biggest thing we learned is that people spent the bulk of their time changing their
settings and text messaging hardly an earth-shattering discovery. But armed with
these usage data and patters, we were able to hone in on the questions we need to ask.
And when we did, we found that texting serves all kinds of social and emotional
purposes beyond simple, direct communication, as is evident in this verbatim:
I used to only text people when I had some immediate question or plan to make, but thats
changed over time. Now a lot of the text messages I send are actually kind of stupid. But Im not
really one of the cool kids in my office, where all the women are glamorous, so texting helps me
feel less alone. Its a way to look social, even be social, without having to fit in.
This quote illustrates the ways in which people reflect on their own experience and
behavior over time. We continually construct and revise narratives, make meaning out
of what weve done and why. Those narratives are at least as predictive as past
behavior, but harder to elicit, because they arise out of dialogue. Whether its a dialogue
with oneself or another person, they develop and morph over time as a result of
question and thoughtful answer.
Just as identity is fluid, so is the process of making meaning. Data is fixed it represents
a snapshot of a moment in time. And if your objective is to understand how people feel
about a functional commodity like a dish detergent or a lightbulb, periodic snapshots of
static data is probably just fine. But even commodities have identity and emotional
facets. Am I an environmentally responsible person or a frugal one? Do I place a higher
premium on value or sensory experience? And how do those feelings and attitudes
change over time, and when Im with different people?
If you have a collaborative, long-term relationship with consumers, one where you can
repeatedly go back to the same people over time, you stand a better chance of
answering those questions.
81
Measurement and Empathy
Data, small and big, can be an invaluable tool in alerting us to patterns, surfacing
problems, and suggesting questions. But data alone rarely moves people to action. Just
think about the plight of refugees, the victims of natural disasters, or the chronic
suffering of impoverished people within our midst. We all know the facts, but its
peoples pictures and stories that excite our empathy and make the facts impossible to
gloss over or ignore.
82
3. Framing Big Data Analytic Requirements by James Taylor
There are a number of reasons for this but the most widespread is that the investments
are being made before considering how that investment will show an ROI. Because the
investments are being made by technical and analytic professionals fascinated by the
technology, rather than by business executives and managers, no-one is taking the time
to identify the business requirements for the Big Data analytics being considered.
Big Data analytic projects are not going to show an ROI because they can store more
data, integrate more data, report on more data or even visualize more data. They will
show an ROI only if they can improve decision-making. Big Data and analytics cannot
move the needle, cannot improve performance, unless they improve an organizations
decision-making. Ensuring that Big Data analytic investment do so means putting
decisions and decision-making front and center in your analytic requirements. An
emerging best practice therefore is to model decision-making to frame the requirements
for Big Data analytics.
83
Metrics about customers, operations, product, and revenue.
Not metrics about data volumes, system performance or statistical
accuracy.
2. Identify the decisions, especially the repeatable, operational decisions that make
a difference to these metrics.
How you decide to act determines your results and understanding which
decisions impact which metrics identifies the decisions that matter.
The more often a decision is made, the more likely it is that analytics can
make a difference.
Decisions that get made often like customer next best action, identifying
a fraudulent transaction, pricing a loan based on the risk of the applicant
are data-rich environments.
Plus any analytic improvement will be multiplied by the number of times
you make the decision for maximum ROI.
3. Build a model of the decision-making that will ensure good decisions get made
To improve decision-making you must first understand it.
Use the new Decision Model and Notation (DMN) standard to describe
the decision-making, information requirements and knowledge involved
in your decision-making.
Break down complex decisions into progressively simpler ones to see
exactly how the decision should be made.
Use the models to get agreement, eliminate inconsistency and build
business understanding before you start your analytics.
4. Show where and how specific kinds of analytics can impact this decision-making
Each element of the decision-making can be assessed to see if an analytic
might help it be made more accurately.
As everyone (business, IT, analytics) can understand the model its easy to
get business owners to say if only we knew XXX we could make this
decision more profitably.
84
Any analytic identified has a clear, precise role and context in the
decision-making thanks to the model.
5. Identify the data and the Big Data infrastructure needed to build and deploy
these analytics
What data does this analytic need? Where is it? How much of it is there?
Do we want to avoid sampling it? How real-time is the data feed?
These questions justify and drive adoption of Big Data infrastructure to
meet these well-formed business requirements.
The decision model itself shows how the analytic must be deployed and
how it will be used to drive business results.
This decisions-first mindset ensures there is no white space between analytic success (a
great model) and business success (improved results). By providing structure and
transparency, a model of decision-making requirements promotes buy-in and shows
exactly how Big Data analytics will improve your business model.
So dont just invest in Big Data infrastructure, focus it on improving specific, modeled
decisions.
85
4. Combating the Coming Data Deluge: Creating real-time awareness in the age
of the Internet of Things by Ryan Kirk
86
In the world of IoT the network will become an indistinguishable component from that
of the physical products and devices. This represents a unique opportunity for
telecommunication-oriented business to participate in the blossoming marketplace. At
the same time that we are learning to piece together an interconnected network of
devices, we are also learning how to process a multitude of data in streaming fashion.
Given the influx of networked devices and the growth of intelligent data services, it
seems likely that the ability to transport data will become at least as vital as the ability
to store and transform data.
Define the problem, the business domain, and discover our existing data
Every analysis has to begin with defining the problem at hand. Often times a business
owner may know the area of the business responsible for causing them headaches and
lost sleep. However, they may not know what it is about that area of the business that is
causing problems; they may not know how to go about testing for various problem
sources. For this reason, the job of an analyst in the age of IoT still begins with
formulating a problem statement and establishing initial hypotheses. This is the first
stage of answering tough questions. Creating a problem statement requires
87
understanding the various entities and processes involved in the area of interest. If no
domain model exists, the analyst will need to create this model by conducting
secondary research, by talking to domain experts, and by performing exploratory
analysis. As is often the case, this exploratory process will reveal gaps in the existing
data model.
We think like a scientist when we form hypotheses about the relationships between
concepts and then test them. This stage results in an understanding of the current
domain and business models. Armed with this knowledge and with a sampling
strategy, we can start to ensure that we are capturing the data necessary to test our
hypotheses.
Even the best methods of analysis will fail due to incomplete domain or incomplete
data models. This becomes more complicated in the age of IoT because we will have
many thousands of devices. Because of the number of devices, the number of
interconnections, and the sampling rate, the volume of incoming data will be an order
of magnitude larger in the IoT age than what we see even in the age of social networks.
Because this stage may help inform our data model, it is important to define what data
points we will collect and how often we will collect them during this stage. A collection
strategy is not a trivial decision. The polling interval needs to be sustainable and
comprehensive. Polling needs to occur often enough to capture the typical variation in
the devices that you are polling. However, this polling rate cannot exceed the rate at
which we can process incoming data.
As the scale and complexity of data increase, it becomes more common for datasets to
contain signals that we have not seen before. Many people and processes are free to add
88
new devices to the network. Each device manufacturer is free to define custom metrics.
Keep track of all of the devices and Figuring out what each of these metrics represents
and connecting it back to our business model becomes a non-trivial effort. Fortunately,
we can use our analytical methods to help us discover patterns within these metrics.
The field of Data Integration offers us tools we can use to automatically discover
connections between signals. We will increasingly rely upon these techniques to help
combat the complexity and scale of IoT data. As we have seen with other service
models, it seems likely that Data Integration as a Service is likely to become both a
product and a point of future market differentiation.
Because there can be many thousands of these devices, we often think about monitoring
in this context as a form of real-time alerting. As we continue to tie the topological
model of devices back to our business model, we will gain a better understanding of
both. Given a data collection strategy, a data discovery discovery strategy, and a data
model, we can begin to examine the expected behaviors of the IoT. If we know what we
can expect from a device, then we will also know what is unexpected. We want to create
intelligent alerts that understand the presence of anomalies within one stream of data
may look different than anomalies in another stream. We can do this by understanding
the historic values and by using forecasting algorithms to predict the expected future
values in the near term. This gives our business a real-time situational awareness
necessary to make informed decisions amidst the context of a global network of
massively inter-connected connected devices.
As mentioned above, the first step to determining root cause is to model each device.
While we can predict a range of future expected values by examining a single signal, we
often struggle to determine whether future values will be at unexpectedly high or low
89
levels. By definition, a forecast built upon expected values will not predict anomalies.
Instead, we express the rate at which we want to be surprised through the use of a
confidence interval. However, there is a more intelligent approach we can use. This
approach is to build an attribution model that helps us to tie the cause of an alert in one
signal to a cause in one of many down-stream signals. This is only possible when we
begin to understand the relationship between various signals.
Often times a collection of signals relate to each other hierarchically or conceptually. For
example, we see hierarchical relationships between clusters of compute devices the
actual compute nodes themselves. We see conceptual similarities between the metrics
for read latency and for write latency on a single device. We want to learn the
relationships between these signals quantitatively and automatically. An approach
capable of learning these relationships will examine the historic distance between
various signals. For example, we could look at the correlation between read and write
latency to discover that they are highly related. Doing this at a massive scale becomes
more challenging. However, it is a solvable problem. A good example of the use of this
approach in practice is the launch of the Google correlate engine. This technology
allows users to find search terms that are similar based upon the frequencies in search
usage patterns.
As with any approach, correlation has limitations as a tool for discovering relationships.
For this reason, we need to augment these beliefs using some type of meta-model
capable of helping to prune the false positive connections that a correlative approach is
prone to creating. We can perform this pruning using graph-based inference.
Once we know the relationships between devices, we will need to keep track of them
using some sort of meta-model. A generic and useful way to do this is through the use
of a probabilistic graphical model (PGM) that connects each device together based upon
this set of relations. The lines connecting the graphs represent the probabilistic influence
that one device conditionally has upon another. They can also represent this a similar
conditional influence between concepts. This graph is powerful because it allows us to
perform two very important forms of inference. First, it allows us to predict the
expected value on one device given the performance of other devices. We do this using
90
maximum likelihood estimation (MLE). Second, we can classify the state of a device as
normal or abnormal by connecting this MLE to a labeled training set. This maximum a
posteriori probabilistic (MAP) reasoning allows us to classify even when we have very
little training data. Finally, this graph will allow us ask questions such as: Which
devices were affected by the anomalous behavior in Device A last night?
A pattern-recognition algorithm would not scale well to the handle all of the possible
connections within an enterprise IoT deployment. However, since we already have a
PGM that contains this information, we only need to learn the actions that humans
would take in specific contexts and then tie these actions to the size of change we need
to see in a given signal. Then at this point we can create a supervised learning system
that can connect the set of possible actions to changes of various. Fortunately there are a
host of modern, advanced pattern-recognition techniques we can use to accomplish this
task.
Once we have a supervised model that understands the actions and outcomes we can
connect our model into an online system. This system will be capable of being adaptive
in real-time. It will also be able to self-heal. This is an idealized end state for an
enterprise system of network-connected devices. We can use this type of technology to
help us build amidst the complexity and scale present in the age of IoT.
91
What is the business impact of a predictive model? From a scientific perspective, a
successful model is that which performs better than a nave model. A nave model is
our control one that behaves randomly but is still constrained to operate within the
constraints of our domain. From a business perspective, a successful model is one that
we can show increases favorable outcomes and or decreases unfavorable outcomes.
Often we think of models in terms of their ability to boost incremental revenue, to boost
profit, to reduce expenses, to reduce lead times, to increase efficiency, or to reduce the
likelihood for defects to occur. How do we quantify the effect of our model on these
outcomes?
The answer depends upon the receive operator curve (ROC). We can examine the area
under the curve (AUC) and connect this to business outcomes. We figure out what the
expected cost is for a false positive, for a false negative, for a true positive, and for a true
negative. Sometimes we need to perform separate analyses to determine these. Once we
have this unified costing function, we weight the AUC using this weighting function.
Then we can optimize the predictive framework to create an ROC with the highest
possible AUC. We perform this comparison for three different models: the current
business practice, a nave model, and our proposed model. We compare against the
current business practices to illustrate the effect size of using the model. We compare
against the nave model to determine whether it is our model that is yielding the results
or whether it is simply a business process change that would yield the results.
Cost savings and revenue generation of a model are tied to the function of a predictive
model. If your model operates on a set of data responsible for 25% of the operating cost
for a company and it is able to provide a 10% increase in accuracy compared to a nave
model, then you may have saved the company as much as 2.5% of their total operating
budget. If a company earns $10 for every $1 of profit, then the results of your model
could have just increased profits by 25%. However, this is just a guideline for
understanding how to invest in these technologies. Since there are many factors in a
system as complicated as a business, we want to confirm the actual savings or revenue.
Finally, beware of dragon kings; watch out for large, anomalous values that occur
more often than expected. The definition of a complex system alludes to the presence of
92
statistical anomalies that occur and a rate higher than we could predict using the
standard assumptions of stable distribution. When in doubt, it is better to create a
robust model capable of learning amidst never-before-seen circumstances. Favor the
unsupervised learning approach unless you have good reason to use a supervised
classifier.
Final words
The ability to predict is not a solution for every business problem. Big data, machine-
learning solutions are in vogue at the moment. In many cases they deserve the respect
that industry gives them. However, gauge the expense of committing to developing and
supporting the proper predictive models against the expected outcomes of these
models. Use this type of analysis to justify decisions to pursue a course of action for
your business.
As we think about the success of analytics over the past decades, we are reminded of
the common principles behind these successes. Even amidst uncertain and changing
circumstances, we can maintain a data-driven perspective that allows us to iteratively
refine our inference.
This section presented the reader with an argument that illustrates how the changing
business landscape will inevitably lead to changes in the supporting analytics
landscapes. This section illustrated how we can still have confidence in our research
techniques and how we can still build powerful analytic models despite uncertainty.
Finally, this section showed that to we could measure the success of an initiative by
connecting that initiative to the expected change in business state as a result of model
adoption.
93
5. Big Data, Small Models The 4F Approach by Kajal Mukhopadhyay
What is Big Data? The definition includes 3Vs, and optionally a fourth V as defined by
IBM earlier 20151. In simple terms, these are:
Most businesses are concerned with volume, and some businesses face challenges with
data velocity and few hurdle with variety.
With Big Data comes to the need for the understanding the business value of harnessing
it including but not limited to data processing, summarization, modeling, predictive
1
The Four Vs of Big Data IBM Infographics. http://www.ibmbigdatahub.com/infographic/four-vs-big-data
94
analytics, and data visualization. Finding a uniform solution or platform to deliver Big
Data analytics is a challenge for many organizations let alone justifying the cost of
building such solutions. I introduced a new way of operationalizing Big Data analytics
leveraging selected analytical methodologies along with modeling principles fitting to
the present landscape2. The term I use is 4Fs rhyming with 4Vs, making it easier to
remember. These are Focus one KPI, Few controlling factors, Fast computation and
Forward-looking.
1F. Focus one KPI at a time: Analyzing too many goals, objectives or KPIs within a
single data model is difficult with Big Data. Try building one KPI at a time within a
single analytic framework. If needed, try combining individual KPIs at a higher level
for broader insights (newer statistical methods may be needed to achieve this). For
most cases, singular KPI model is far more effective than complex multi-KPI models.
2F. Few controlling factors: In an ideal world goals and objectives are influenced by
numerous factors, however, in reality, there are only few that have the most impact
on the KPIs. Identify and use fewer controlling factors in your model. Often a set of
two by two causal models is good enough explain the KPI behavior.
3F. Fast computation: Fast computation is critical for Big Data environment.
Computation algorithms for any type data analytics and visualization must be
distributed (parallel processing), additive (Map/Reduce) and modular (multiple
model join).
4F. Forward-looking: Efficacy of Big Data models rely on the predictive accuracy of the
outcome variables or short-term forecasts. Analytics should focus on building
models that can predict the next best outcome within a given set of objectives. It
may not focus on explaining the historical behaviors, rather concentrate learning
algorithmically and predict the most likely outcome.
In order applying 4F principles to data modeling, I suggest three types of popular and
widely used statistical methods, all are simple to execute and implement and provide
fast answers.
2
Marketing Analytics Conference - The DMA, Chicago, March 9-11, 2015.
http://www.slideshare.net/KajalMukhopadhyayPhD/mac-presentation-kajal-mukhopadhyay-20140307
95
C1. The first is a class of descriptive methods that utilize frequency distribution, binning,
classifications and statistical tests. With Big Data, any measure of KPIs using these
principles would have smaller standard errors. Any bias can be adjusted using
simple A/B tests, differences with respect to benchmarks, trends and applying a
universal control on the data systems under certain environments.
C2. The second is the class of models based on the principles of conditional probabilities
and any variations of Bayes classifications. Typically one can store them as some
form of recommendations tables. These models can be utilized to build fast
inferences, and use of indices, odd-ratios, scores and other measures of KPIs.
C3. Lastly, the Bayesian network and graphs constitute the class of models that can be
used in Big Data environment. Much of the Artificial Intelligence and machine
learning algorithms in computation engineering are based on these type causal
models3.
For any practitioner of Big Data analysis, adopting a pragmatic approach to data
modeling, such as 4F, is of tremendous value. All three class of models can be combined
to create new ways of the analyzing humongous data while being efficient, fast and
statistically relevant.
3
Causality: Models, Reasoning and Inference, Judea Pearl, 2nd Edition, Cambridge University Press, New York
2009.
96
6. The Five Letters that Will Change the Data World: BYOBI by Tomasz Tunguz
This is the way business intelligence software used to work. On the left-hand side,
engineers transform data from handful of databases kept on premises into an internal
data warehouse. The data is then modeled and optimized for answering the questions
the business has today. Reports and visualizations are then built for easy access.
Inevitably, those questions will evolve and new ones will crop up. But at design time,
those questions are impossible to anticipate. Inevitably, the BI tool fails to live up to its
potential and teams can't access the data they need.
97
What's worse, with the explosion of cloud and SaaS software use, the data
fragmentation problem has exploded. Now, company data isn't stored across a
collection of databases within the company's firewalls, but is scattered across thousands
of servers all over the world. A diaspora of data.
Nevertheless, teams within the company still need to access and process and analyze
that data irrespective of where it resides. Faced with this problem, he decided to
dramatically transform his data architecture into BYOBI.
Fig 1.
98
Fig 2.
Two key innovations enable this tectonic shift in BI, faster databases and better data
modeling. Redshift, BigQuery and others are taking the data world by storm because
their performance benchmarks are staggering over huge data sets. This enables next
generation BI tools to provide real-time responsiveness over all the data, not just a
sample of data, as in the past.
Data modeling enables data scientists to encode the structure of and relationships
among data sets into one compact file and then enable everyone within an organization
99
to ask and answer questions of the data in the correct way, without help. Emboldened
by newfound confidence of having one consistent source of truth, the organization
becomes empowered to be more decisive and move faster.
With this BYOBI architecture, end users decide which tools are the best ones for their
use cases. IT's responsibility becomes data access enablement rather than data request
fulfillment. Consequently, the entire team is happier and more productive.
BYOBI will become the dominant data architecture over the next ten years. As this shift
happens, the newcomers like Looker, which provides novel data modeling and data
exploration tools will take significant share from the hegemons of legacy BI. For proof
of this coming fragmentation in BI, look no further than Tableau's success. Now a $4B
company serving more than 10,000 customers, Tableau proved end users could
influence BI buying decisions. But this is just the beginning of the BYOBI trend. BYOBI
will disrupt the $13B business intelligence world, creating massive share shift, giving
rise to new, massive companies and enabling teams all over to run their businesses
more effectively
One of the things I've loved most about crafting my career in online marketing is that
there is lots of feedback when it comes to campaigns. You can tell almost instantly what
people respond to and what falls flat.
When I started out in marketing in 1999, billboards, radio, television, print and other
difficult to track media ruled, but the prevailing advertising wisdom was, "Half the
money I spend on advertising is wasted; the trouble is I dont know which half." (John
Wanamaker) That seemed to satisfy most clients who were easily convinced that
spending more meant more return. As the web evolved and more and more people
started getting their purchasing influence and information on it, we were able to
pinpoint which half was wasted and which half worked.
However, this information only tells half of the story. Yes, people clicked on this ad and
not that ad and people shared this post and not that post...but why? That's still a big
gap in information.
101
"The trouble with data is that it asks as many questions as it answers."
My theory is that these questions are the ones that most marketers (and their clients)
dread, so the data is ignored (at best) or misinterpreted (at worst).
I was joking around with some associates of mine the other day about formulas that
work. We made a game of listing off the sure-fire wins in social media marketing, "Let's
see. Cats...that guarantees a win. Puppies are good, too. Oh...and people like
drunkenness! Yes! And crass humor. We just need to make a funny drunken cat video
and it'll totally go viral!" The problem with this theory, besides the obvious animal
cruelty controversy it's sure to drum up, is that, well, funny drunken cats aren't really
appropriate to every brand. Not to mention that people get tired of formulas pretty
quickly. It's not cats or drunkenness or humor. If we want to reverse engineer success,
we need to go beneath the surface. Instead of just blindly accepting that people love cat
videos, we need to understand why they love cat videos
There are no formulas, but there are lots of clues for every brand and it rests in the
questions behind their own data. Instead of just reading the numbers and making wild
guesses on why something works and why something doesn't, the people who work
102
with the data have to become Data Whisperers...a skill that requires looking at the
quantitative AND the qualitative and looks between the connected dots.
Data Whispering is the ability to tell a story with the numbers. Data Whispering is
uncovering the human side of the numbers. Data Whispering is understanding the right
question to ask of the numbers. And to do that, you
need to step back from your own bias as much as
humanly possible and find the story the data wants to
tell you (not what you want it to tell you).
They don't only tell you what people are talking about, but how it's related to your
product and how much weight it carries. And it's perfect for general market research as
well. If you know a few data points about your potential audience, you can find out
more data points about that audience. For example, if you are running a candidate for
political office in a region, you can find out what people in that region really care about
103
- helping that candidate form their political platform to suit their constituents' needs. If
you are thinking about opening a restaurant, you can pinpoint gaps in the market or
even figure out where you should be looking to locate your restaurant. For established
brands, you can see how people see you versus your competitors. What terms do they
use?
Unlike sentiment analysis - of which I'm not a huge fan because it, too, asks more
questions than it answers - Nexalogy gives a neutral, unbiased view of why people are
making the decisions they are making. Claude Theoret once demonstrated an analysis
they had done for a bank, showing that people thought of them as fair, good,
community-focused, providing personal service and friendly versus their bigger
competitors that people thought of as secure, prestigious, cold and impersonal.
According to Theoret, the bank used this information to focus on expanding their
community programs and promoting their one-on-one personal touch and saw great
outcomes with this campaign.
I love data, but I'm not a fan of more of it. I'm an advocate for understanding how to
read it and learn to tell a story from it. It's a skill that needs to be a regular part of a
marketing curriculum. Knowing that people love that nerd kitty image is great,
knowing why is even more valuable.
104
8. Beyond "Big Data": Introducing the EOI framework for analytics teams to
drive business impact by Michael Li
105
What I would like to talk about today is an analytics framework called
Empower/Optimize/Innovate (EOI), which has been used by the Business analytics
team at LinkedIn to continuously drive business value through the leverage of Big
data. Below, Ill explain the three categories of the framework in detail and with
examples.
E stands for Empower: Empower business partners to have access to the data and
insights they need when they need them
The most common practice for this category is performing ad-hoc analyses based on the
questions asked by the business partners: How much money we made in the last
week/month/year?, What are the key drivers for the big year-over-year drop in one of
the key business performance metrics?, etc. This is probably how the majority of
people understand what analytics is, and its truly important for businesses, as it
empowers decision makers to make data-driven decisions (or at least consider making
them). Many analytics teams today spend most of their time in this category. Over time,
analysts perform similar analyses multiple times and get more efficient and productive.
However, the issue is that analysts may also get bored by doing similar analyses over
106
and over again. The key to solving this issue is to create leverageto automate and
simplify processes, such as data pipelines, data cleaning, convert data into certain
format, through technology as much as possible, so that you spend more time focusing
on the more exciting pieces: finding insights and making recommendations to business
partners. A typical example from our Business analytics team is an internal analytics
web portal called Merlin, which we built to offer easy access to compelling insights
automatically via one-click searches for our sales teams to share with clients. Every day,
thousands of people from sales teams use the portal and get the
data/metrics/reports/graphs they need in a completely self-served way. Because of big
improvements to sales productivity and the large financial impact this has had on our
field sales business, this project was chosen as one of LinkedIns Top 10 most
transformative stories of 2011 by our executive team and received the Leverage
award from our Global Sales Organization.
107
I stands for Innovation: Innovate the way analytics can help our business grow by
leveraging both internal and external data
In Silicon Valley, innovation is a word that gets everyone excited. There are many ways
for analytics teams to be innovative, and we believe that the ultimate measure of an
analytics teams success in innovation is the business impact it has. When we evaluate
the potential for an innovation or venture project, we look at the potential business
impact it will have in the next 13 years, mostly in terms of incremental revenue/profit
or user engagement/page views. Wed also make sure that there are strong business use
cases that can leverage the outcome of the project so we can validate the go-to-market
strategy for our analytics solutions immediately, not just innovate for the purpose of
enjoying being innovative. One recent example from our team was the Account Interest
Score we built with our marketing team to prioritize the enterprise prospects that have
a higher likelihood of becoming customers. The key innovation here was to be able to
measure the likelihood of conversion at an account level, which combines weighted
individual level scores and the influence of a decision maker in the B2B selling process.
It has been widely used by our field sales teams for acquisition since its creation, and it
has been driving higher conversion rates, which have resulted in higher sales revenue
and efficiency.
Since I have been advocating for this framework, Im often asked the question, What is
the right allocation of resources for the EOI framework? The truth is, depending on the
evolution of the analytics team, you may have to have a different distribution curve for
the analytics resources that you spend on EOI; the key is to make sure you have at least
meaningful investment in each category and a reasonable macro allocation that you
believe works best for the current stage of the business. In general, based on my
discussions with many of my analytics peers in the industry, Id expect for us to see %E
> %O > %I more often, especially for companies that are growing at double figures
every year. Thank you for your time. Id be happy to discuss how we can improve this
framework!
108
9. Can You Seize the Opportunity from Analytics If You are Cheap? by Lora
Cecere
Manufacturers want a definitive ROI before they commit to spend, and exploring the
potential of these new forms of analytics requires a focus on investment with an
unknown ROI. While the services sector has made the leap, the manufacturing industry
has not. The gap is growing....
The opportunities stretch before us. This includes: Hadoop, Cluster Analysis, Streaming
Data, Machine-to-Machine Flows, Pattern Recognition, Cognitive Learning, Sentiment
Analysis, Text Mining, Visualization, and the Internet of Things. However, I don't find
these concepts on the road maps of the manufacturers who I work with. When I talk
about the use of these new concepts in the world of manufacturing, most executives
scratch their heads. Their focus is on yesterday's investments with a focus on Enterprise
109
Resource Planning (ERP), reporting, and deterministic optimization. I believe
investment in these new analytical concepts is the foundation for digital business.
(now merged with Kraft). My plant manager, Dick Chalfant, and I had a running debate
on the investments in personal computers. At the time, no one in the plant had a
personal computer. I believed that we should begin to invest in employee productivity
through a focused investment in personal computing. Dick would argue with me. He
would rant and say, "Where will we reduce labor and headcount based on this
investment? What is the ROI?" I would smile. The arguments were heated. Our visions
were different. I believe that companies need to invest in technology to drive a
competitive advantage, and that not every project will have a well-defined ROI.
Today, I don't think that anyone I work with can imagine a world without PCs on the
desk, in our briefcases, and in our homes. It redefined work, and was foundational to
driving higher levels of productivity for manufacturers in the past decade. I can see it in
110
the numbers of revenue per employee (note the impact on consumer value chains for
the period of 2006-2013).
I strongly
feel that our traditional views of applications--ERP, SCM, CRM, SRM--limit our ability
to be more open in testing and learning from new forms of analytics. While people talk
big data concepts, current investments are tied up in maintenance funding for legacy
applications. My challenge to manufacturers is to free the organization to invest in new
forms of analytics by giving them investment funding. I am not advocating financial
irresponsibility. Instead, I am advocating to start with seed money and a focused team
to learn the power from new forms of analytics. I would challenge them to self-fund the
future through small test-and-learn strategies.
So today, as I look out my window at Steamboat, I wonder if this post will fall quietly
like the snow from the boughs of the fir tree outside my window? Or will it stimulate
some organization to rethink the future? I hope the latter. If nothing else, I hope that
some organization somewhere will stop and think about analytics and the future and
ponder why financial and insurance industries are moving faster than manufacturing.
That is my hope... I would love to hear your thoughts.
112
sponge - a write off - over the investments made for the project. CIOs were taken off the
case in favor of sales departments, which took a perverse pleasure in subscribing to
services like salesforce.com that require minimal intervention from IT experts.
Some would say that the problem is technical, that it's normal for an innovation to be
entrusted to the technicians first and that its dissemination within the organization
takes time. Another more fatalistic view is that it is a manifestation of a psychological
invariable. As for me, I see it as a sales problem.
In my view, the major reason that the first wave of implementation of a technological
innovation tends to break like a wave over the undertow is due to two equally
detrimental phenomena coming together:
In both cases, this results in too little consideration of the beneficiaries when putting
together the sale. Because for the sale of an innovative product to take place for the
benefit of everyone in the B2B environment, it is important that it be firmly grounded
on three main pillars:
the innovation is the object of desire for those who are its beneficiaries (principle
of desirability);
113
its operational implementation is guaranteed by technicians (principle of feasibility);
By trying too hard to seize control of the innovation as soon as it is released on the
market, the technicians forget to create the conditions for desire to emerge on the part of
the beneficiaries. By neglecting to involve the professions in the solution's design phases
in the conceptual sense of the word, they alienate the support of those who should
benefit most from it. As for the sales reps, they are content to address the technical
contacts and make "small-scale" sales because they didn't seek the perspective of those
who, beneficiaries or representatives of top management, are likely to appreciate both
the desirability and the economic viability of the initiative.
This is exactly what is happening now in the oh-so-promising "big data" market. In a
recent McKinsey report entitled Getting Big Impact from Big Data, David Court
highlights the barriers impeding the take-up of "Big Data" within organizations. I have
counted six, which are like so many symptoms of the repeated
"enthusiasm/disenchantment" cycle mentioned above:
The "data scientists" is the star player yet they remain rare and are often quite
expensive;
While the investments - particularly the maintenance costs of existing systems - are
high, "top management" is disappointed to see that these big data projects have not
yet resulted in a noticeable ROI;
Unfortunately, due to skill set disconnect and tool incompatibility between tech and
business oriented teams, data scientists often have no choice but to go along with the
tune of "give us the raw data, we'll extract the nuggets from it", thus alienating themselves
from the support of those who should be their biggest fans;
114
The "top management" is disappointed to see that, despite some initial outcomes
worthy of interest, the systems in place falter in comparison to the bombastic promises
associated with "big data";
Finally, the beneficiaries in the trades are put off by the "black box" aspect associated
with the implementation of the first "big data" applications. It is difficult for them to
build strategies on results whose origin they don't understand. The result: they tend to
fall back on driving blindfold, which has been more successful for them thus far.
When Dataiku decided to build Data Science Studio software - an advanced analytics
tool for the whole data team - they all agreed upon one undeniable truth: data teams are
diverse and often include data scientists and software engineers who share projects
with marketers and salespeople. That's why they decided to create a tool where analysts
can point, click, and build, developers and data scientists can code, and high-level data
consumers can visualize, enabling different skill sets to productively work together
using the tools and languages they know best to build end-to-end services that turn raw
data into business impacting predictions quickly.
But once again, I'd like to get back to the sales approach. The trick in complex sales, we
have seen, is to harmonize the perspectives of three types of stakeholders - the
beneficiaries who express the desire, the technicians who ensure feasibility, and the
115
economic decision-makers who look at economic viability. Neglecting one of the panels
of this triptych results in a situation of failure:
With the creation of a "chimera" in the event that you have omitted to take into account
the technical dimension;
With the creation of a "dancer" if you have neglected the economic viability panel;
Or worse, with the creation of a "false bargain," if you've simply obscured the
expression of desire in the mouths of the beneficiaries.
By putting the beneficiaries back at the center of the process - that is to say, in this case
by denying the jealous and exclusive grip of "data scientists" on the "big data" subject,
organizations give themselves the means to minimize the risk of disappointment on one
hand, and on the other hand to maximize the chances of seeing their investment bear
heavy, tasty fruit.
Note: Gartner, the research firm, has created a pretty awesome tool for accounting for
the now well-known hot-cold phenomenon characterizing the emergence of innovative
technologies on the market and their adoption by the organizations: the "hype curve". It
is no more nor less than a question of placing technological innovations on a curve
describing the five main phases of new technology take-up, namely:
Today, according to the Gartner analysts, "big data" is in the process of beginning the
plunge into the abyss of disillusionment.
116
11. The Big Hole in Big Data by Colin Shaw
That might sound impressive and, by volume, it certainly is quite impressive. The
problem with Big Data, though, is that it tells only one side of the story. The big hole in
big data is that there is no emotional data being collected. Over 50% of a Customer
Experience is about how a Customer feels and yet most organizations dont have a clue
how their Customers feel. This is a big, big hole! Big data lacks emotion, feeling, or
justification for WHY Customers behave in the way they do. While that data might be
useful in any number of ways, it will not help refine the Customer Experience beyond
potentially redesigning a website or asking a question in a slightly different way. This is
a big "hole" and it could leave businesses without a solid Customer Experience strategy
over time. To understand Customer emotions you need to understand Customer
117
behavior. You need to undertake specialized research such as our Emotional and
then design emotion into your Customer Experience.
The Problem with Actions: They Just Don't Tell the Whole Story
Relying solely on the
information gathered by Big
Data is like watching a group
of people from a relatively far
distance. It's possible to see
what they're doing while they
interact with each other and
engage in conversations, but
it's virtually impossible to
understand why they're
holding those conversations,
what are they feeling that drives their actions, what is the emotion underpinning those
conversations, and most importantly, how they'll determine the future behaviour of
each individual and the group at large. If the bystander was to walk toward the group
and attempt to join their conversation, they'd have no real way of working their
message in amongst those already being heard. They'd have no idea what the mood
was, or where to start.
This is essentially how Big Data works. At best, it attempts to capture what people do
from an emotional distance. It sees their actions, but not the reason why they are doing
what they do. For what it's worth, actions do mean something and they matter quite a
bit. A business with a high bounce rate on its landing pages, for instance, might
determine that Customer actions indicate a poor bit of marketing copy or a lousy call to
action. But in other environments, merely monitoring Customer actions doesn't mean
very much for the business' bottom line and their approach to the marketplace, you
need the insight to human behavior which is key.
A Secondary Hole in the Big Data Picture: Customer Experiences and Behavior
118
It's pretty tough for a business to optimize their services and their approach to the
marketplace if they don't understand why a Customer does what they do, and given we
are driven by feelings what is the emotion behind a Customer action. These actions can
be even harder for them to make the right strategic moves if they don't compensate for
prior negative experiences, behavioral patterns, and other factors that affect their
experience; everything from website clicks to in-person business interactions and even
things like takeout fast food service.
A good example is the typical Customer service phone line. Upon dialing, Customers
are greeted with a variety of options, which ask them to press two for sales, three for
service, and other options that segment calls and send them to the right department.
Customers, though, tend to hate these systems. Since their introduction, they've become
a point of frustration and a comedic punch line. The "just press zero for an operator"
trick is well known. How does Big Data record this? How can it tell the levels of
frustration a Customer may be having? How can it tell what a Customer would prefer?
Big Data, of course, would not show this. It would not compensate for these learned
behaviors and prior experiences all on its own. All the data would show is that a certain
number of Customers pressed zero for some reason, at some point in the call. The
business would then be left to draw its own conclusions, and they may very well be
wrong.
There is more to the picture, though. Businesses need to work on fully defining their
Customer Experience from start to finish, both rationally and emotionally. One of the
119
best ways to learn about Customer emotions and plan for them is to decide which
emotion should most often be evoked by the company's marketing messages, websites,
Customer service professionals, and other key materials and points of contact.
Then mapping your customer experience, from an emotional perspective allows for
more robust testing of the experience via Big Data. Essentially, it allows for tests of
effectiveness and Customer satisfaction that provide illuminating insight into areas for
improvement and areas of success. Combined with Big Data, the future direction of a
business becomes much more obvious and far more meaningful.
Data is Just One Part of the Picture, and it Doesnt Speak for Itself
Businesses are fond of saying that the data "speaks for itself," but it actually doesn't do
that at all. Data speaks for a brief moment in time and it shows an action that was taken.
The speaking comes from Customer emotions, learned behaviors, and prior
experiences. Those things need to be more carefully monitored and analyzed so that
they become meaningful. When these things are combined with data, only then will the
data "speak," but certainly not on its own.
Companies looking to get the most out of their data services need to begin mapping the
Customer Experience, monitoring emotions, and gaining real insight into the minds and
thought processes of their Customers. It is in this area that the future of business truly
lies.
120
13. Big Data: Two Words That are Going to Change Everything by Ali Rabaie
121
are just starting to uncover the many ways that this information can be mined for
patterns used to predict, shape and react to events happening in the real world. This
will eventually move us from predetermined questions to a world of narratives and
exploration that help us answer questions we did not know we had.
Now, lets play the Matryoshka doll toy. But hey, I have quiz for you. Can you fit the
large dolls inside the smaller dolls? Seriously? Lets see
When we talk about Materia/physical trace, all smaller dolls should physically fit in
the larger dolls but when smartphones were introduced, we were able to fit larger
devices into smaller ones. Indeed, the smartphone replaced a camcorder, Walkman,
calculator, tape recorder, GPS, AM/FM clock radio, and encompassed them into a
single, smaller device.
On the other hand, the data trace was just becoming larger. Smartphones today have
different sensors like motion/accelerometer, proximity, gyroscope, but eventually our
tiny doll is still small physically but generating more and more data. And its not just a
wooden toy anymore. Its a smart toy that knows so much about ourselves and our
behaviors.
122
Buckminster Fuller coined the term ephemeralization, which means our ability to do
more with less until eventually you can do everything with nothing, and when this
translates to being more efficient and accomplishing more work with less and less
materials, our entire economies will change. Thus, as we improve the ways we utilize
materials, we will need less of them.
This is being proven now through big data, where jet engines are decreasing in size
because of sensors that generate huge amounts of data regarding the engines
performance in order to improve its efficiency. In 1956, we needed to forklift a giant box
into a plane to transport only 5 megabytes of data. Today, we can store several
gigabytes of data with tiny flash modules. Boeing also introduced one of the strongest &
lightest materials known to science and is 99.99% air. In addition, iPhone 6 is
120,000,000 times faster than Apollo, the computer that landed us to the moon!
The ephemeralization of technology using big data and Internet of Things will continue
to move us from a material /physical trace to a data trace, and to gradually
replacing specific-purpose devices. This transition will be accelerated by todays
innovative nanotechnologies that are present.
To borrow an example from Viktor Mayer-Schnberger and Kenneth Cukiers book Big
Data, Walmarts analysts have trawled through the million-plus customer transactions
that are logged digitally by the chain every hour, and one of the many surprising micro-
trends they uncovered was that sales of Pop-Tarts spike just before a hurricane. Now,
whenever a storm is on the horizon, store managers put Pop-Tarts on display near the
entrance. The tweak worked: Walmart increased its profits, and while no one has come
up with a theory as to why inclement weather will provoke a craving for that particular
breakfast snack, no one needs to. On a big enough scale so the thinking goes the
numbers speak for themselves. As businesses start mastering the basics of big data, the
field is already changing around them growing and evolving rapidly. In the coming
years, businesses will be able to process data streams in real time and having them
power recommendation engines. Rather than getting analysts to make a business
decision based on past performance, the system will make automatic adjustments to the
way the business is run on the fly. To take the Walmart example, this means a hurricane
warning would automatically trigger an increased order of Pop-Tart stocks, without
123
any need for human intervention. Machines will understand us via our traces and
providing us with contextual personalized services. Programming these systems may
not be easy, but those who master them will have an edge.
Indeed, Big Data is going to change everything. Think about areas like ecology, sports,
agriculture etc...But the question remains, Can we do everything with nothing? Why
not? In the future, we might not even need a smart phone or watch at all.
124
14. What Social Media Analytics and Data Can't Tell You by Beth Kanter
Jeremiah presented an overview of his study about the collaborative economy that
were currently living in. The report and infographic are here. Colby shared results
from a study on social TV study that looked at various patterns between social media
users and their TV viewing habits.
I presented some initial findings on a study that I did with Vision Critical's data and
large insight communities in both the US and Canada that compared the social media
activity levels of donors.
125
Good research starts with hypothesis generation. I queried colleagues Henry Timms
Giving Tuesday and Steve MacLaughlin BlackBlaud and others who work at nonprofits
with large scale online fundraising campaigns that have a robust social media
component. I asked them what would be most useful to find out? Lots of theories came
up: Is Slacktivism true? Donation triggers and channels transactions and more.
The overall hypothesis was: More social media activities equals more donations.
Active social media users have been labeled by nonprofits as Charity Slackvists." It
refers to someone who does something for a charity online that requires minimal
personal effort such as changing your Facebook status update. It is a pejorative term
that describes "feel-good" measures that have little or no practical effect other than to
make the person doing it take satisfaction from the feeling they have contributed. The
underlying assumption being promoted by the term is that these low cost efforts
substitute for more substantive actions like making a donation.
We found very little variation of donors based on their social media habits and activity
levels. 59% of the survey sample reported making a donation to a charity in the last
year, consistent with the 58% of the total population from the benchmarking study that
Vision Critical did. Facebook users who like fewer pages on Facebook may be slightly
more likely than the average to donate (72%), but once you factor in age, social media
users are no more or less likely to be donors.
126
But does that onramp of active social media users who do donate lead to a pot gold at
the end of the rainbow? The survey data said, not really. The total average amount of
charitable giving for donors (per year) is very comparable between social media users
and the general population, and doesnt vary by usage no matter how active or
inactive someone is on social, they tend to give the same amount over the course of the
year.
So, that does that mean nonprofits should give up on active social media users as a
fundraising target?
No. When we looked at who is moved to donate after encountering a charity via a social
media channel, there is a positive relationship between the level of social media use and
propensity to go from social to donation. Active social media users are donating. The
chart shows that the more someone uses FB, the more likely they are to have made a
social-inspired donation, especially those who frequently update FB status or like a lot
of pages, less so with those who have a large number of friends.
127
This finding is the gold in the study. These more active social media users are NEW
donors to the charity. This data look at whether the donor was new to the charity or a
repeat donor and their social media activity on Facebook and you can see that people
with who have a lot of friends, update frequently, and like a lot of pages tend to be new
donors to the charity. I dont know about you, but I dont hear nonprofits complaining
about having too many new donors. Social continues to be an undervalue cultivation
channel for new donors.
On the survey, we
asked people if they shared an appeal for donations to a charity. Not surprisingly, the
more often people post to FB, the more likely they are to post something about the
128
charity they supported with that donation. But whats really important is that the more
active they are on FB, the more likely it is that whatever theyve posted is going to be a
specific appeal for donations. It is a virtuous circle. If you get them to donate, they will
share and solicit people in their network.
There are some preliminary findings from this study. Stay tuned for a fuller release of
the data, along with an infographic.
These findings have the following implications, which arent obvious when you rely on
conventional social media analytics, where those less-active social media users
disappear. There are significant differences between these different groups, which
nonprofits need to recognize:
Most crucially: let go of the slacktivism theory. Your most active social
audience is valuable to you not just for their posts, but for their dollars.
Social media is a channel to acquire new donors, but be sure to use social to engage and
connect with them.
If you arent urging your donors to post about you when they donate, youre missing a
huge opportunity. While the majority of more active social media users (those who post
at least once every other day) will go ahead and post anyhow, only a third of less-active
users promote the charities they supporteven if theyve made a social-media inspired
donation.
129
15. Big Data, Great! Now What Do We Do With It? by Ken Kring
Why does the puzzle box lid help make people happier, faster, more productive?
Because the puzzle box lid allows them to all share the same picture. The same picture
of how what they are working on, fits together. When people share the same image of
how it all fits together, it is much easier for them to work in unison toward a shared
goal.
Why don't we do this with Big Data in business? Why don't we put all of the data
together, into the flow of the business? So that people can see: given the flow of
business, where they fit, how they link up and how they affect the whole. Why haven't
we already been doing this for decades?
Part of the reason is that business schools teach business in functional silos. A
marketing silo. A finance silo. An operations silo. And while business silos are great, for
the efficiencies they bring, you can't get those efficiencies any other way, business does
130
cut across silos to get actual work done. So for goodness sake, wouldn't it be useful to
show people how business flows across the silos to get reality based work done?
In the absence of knowing how Big Data, fits together. How Big Data fits into the flow
of business. People resort to displaying data in dis-integrated ways. Ways that at times
seem disjointed and confusing. Sometimes the data is shared as a list of key
performance indicators (KPIs). Leaving everyone reading the list to determine on their
own, how all the metrics fit together. Or the data is shared as a dashboard of gages. The
dashboard often doesn't give you a sense as to how all of these things fit together. Or as
a business scorecard.
The challenge with presenting data in this less than integrated way, is that your team is
left to wonder:
Is this number supposed to be 5 times higher than this other number?
How do these two numbers relate to each other?
And how do those two numbers fit together with all the other numbers?
"How do our jobs fit together to best drive the business with these metrics?"
"Wait a minute, how are we supposed to prioritize all of this?!?!"
131
So then, how do we better leverage the data? How do we make our people, happier,
faster, more productive? How do we build our "puzzle box lid"?
To intuitively derive the solution more easily, it is useful to think about the similarities
between all the business data it is possible to collect, and Seurat's oil painting of a
Sunday afternoon in a park.
Seurat painted in dots. He painted in "data points". When you stand with your nose an
inch away from Seurat's painting, all you see is a bunch of dots. It is only when you take
a step back, that you see the patterns that the dots make. A pond. A woman with a
parasol. And for some reason, a little monkey. This is what data is. A series of dots.
However, if you put the dots together and take a step back, you can see the business
patterns the dots make.
132
I discovered how to put the business dots into a useful pattern, when working with a
CEO a number of years ago. During a conversation he said, "How do you think we
should best grow the company". Having thought about this for a while, I quickly gave
him a number of ideas. He paused and thought about it and then said, "What I really
need to do, is figure out how to best invest in the company." When I quickly went to
modify my recommendations, it occurred to me that the question of "how to best grow"
is very different than "how to best invest".
How this framework was derived and its many uses, are subjects that take up more
room than will practically fit into this article. If you would like more details and
examples, please see the book "Business Strategy Mapping - The power of knowing
how it all fits together".
In the shortened version of what transpired, when I sat down to provide the answer
"how to best invest in the company" it was necessary to first understand how business
flows from potential to profitability and the points in between. Profitability for both the
customer and the company. Because if one of them doesn't "win", neither wins. Because
one will walk away from the other.
133
This general flow is helpful. It allows us to see the overall flow of the business. But how
do we fit the data into the flow? How do we fit data into the flow so we can see the
patterns made by the data. The patterns that tell us incredibly useful things. Thing
about the flow of the business. Where we are doing well and where the opportunities
are. How to prioritize putting effort into optimizing the flow. How to best invest in the
flow, in the business. Which people are going to need to be involved, in which pieces of,
which portions of the flow.
After working on this for years, the fundamental pattern below appeared. The pattern
of how a business, business units, products and projects, flow from potential to
profitability and the points in between. When you populate the flow below, with your
objectives, the initiatives that support the objectives, and the metrics that come from the
initiatives, you get a wonderfully rich picture as to how the business flows.
134
Years ago I was working with a data architect named Chris. Over lunch he told me . . .
Later he told me . . .
"Because we sit over the top all of the data, we are the ones closest to it, all we needed to
do is apply a thin veneer of how the business flows, over the top of it all. We now know
how it all fits together. We are now going from order takers to intellectual leaders. We
135
are now even being brought into projects earlier, because we know how the data fits
together."
People have said that this should be taught in business schools. To show students how
all of their classes fit together. It should actually be taught in high schools. So that high
schools students have a better idea as to how: objectives, marketing, finance, operations
and customers all fit together. It is not that hard. It fits on a restaurant placemat and is
suitable for coloring.
136
16. Decoding Buzzwords: Big Data, Predictive Analytics, Business Intelligence by
Cindy Gordon
137
One could argue that this is a sub-segment of the business intelligence market.
However, one would find the traditional approaches BI to be very model centric, with
high professional service costs, versus being in the cloud, plug it in, cost-effective, and
securing rapid predictive insights in real-time.
Predictive analytics uses artificial intelligence and machine learning algorithms on data
sets to predict outcomes. It does not guarantee an outcome but, rather, a statistical
probability.
Predictive analytics extracts information from existing data sets to determine patterns
and predict future outcomes and trends.
Applied to business, predictive models are used to analyze current data and historical
facts in order to better understand customers, products and partners and to identify
potential risks and opportunities for a company.
138
Big data is the term for a collection of data sets that are so large and complex that it
becomes difficult to process using on-hand database management tools or traditional
data processing applications. The challenges include: capture, curation, storage, search,
sharing, transfer, analysis and visualization.
With increasingly pervasive big data environments, companies must not only sense the
present, but also see the future and proactively shape it to their advantage.
Experts point to a 4300 per cent increase in annual data generation by 2020.
The market for predictive analytics software is estimated to be worth US$2 billion
today, and is expected to exceed US$3 billion in 2017 (Pauler Group, 2013).
Other market research reports forecast the market for predictive analytics software to
reach USD 6,546.4 million globally by 2019. The market growth is driven by increased
demand for customer intelligence and fraud and security intelligence software.
139
Cloud hosted predictive analytics software solution is seen as an emerging market and
is expected to drive growth in the near future. Globally, the predictive analytics market
was valued at USD 2,087.3 million in 2012 and is forecast to grow at 17.8% CAGR from
2013 2019 (Transparency Market Research, 2013).
Summary
This blog post has defined business intelligence, predictive analytics and big data. A
simple way to remember these terms is: business intelligence is simply about making
more informed business decisions by analyzing data. predictive analytics is more
advanced intelligence, using advanced methods to predict and forecast future
outcomes, risks or scenarios.
Big data solutions simply consumes volumes of data that are enormous in size, and
helps detect very complex patterns that are very difficult to see, without massive data
stores being analyze. At the end of the day, this market is in evolution and segments of
the market like predictive analytics, or cloud predictive analytics (simple delivered in
SaaS or cloud models) are in rapid growth mode, compared to traditional BI Vendors,
who are scrambling to up their game, as the data challenge has just got up a smarter
notch in the industry.
140
17. The Case Against Quick Wins in Predictive Analytics Projects by Greta
Roberts
141
Using these predictions opens up discussions of economic discrimination, making HR
and executives nervous. They often decide to ignore their newfound ability to predict
performance, they dont implement the prediction and the project doesnt advance the
case for more predictive projects.
Executives have seen little or no correlation between engagement and actual business
results at their own firm. Imagine trying to sell the VP of Sales on predicting
engagement of their sales reps. At the end of the day their employees arent hired to be
engaged, they are hired to do their job and sell.
Rather than reducing the amount of data, wed rather see you reduce the scope of the
prediction.
142
An example: Instead of doing a predictive analytics pilot project to predict flight risk for
all jobs in the Chicago office, maybe it would yield better results to keep the scope small
and targeted by predicting flight risk for a single role that has a lot of people in it.
Ask your data scientist for their guidance on how to frame your quick win project to
keep the project scope smaller, while giving the data scientist a reasonable amount of
data to optimize your chance for success.
For your predictive projects, quick isnt enough of a win. Instead, you want a quick,
implementable and exciting win that people care about.
The only way to get a quick, exciting win is to start with a project that predicts
something that either saves or makes money for your company. Find a project that
solves an existing business problem. Remember what predicting does for your
organization. Accurate predictions do a better job at decision making so that you have
better end results. End results are the only thing that will get people excited and will be
implemented.
Think of banks that try to predict whether or not to give you a mortgage. They want
to do a better job of extending credit only to people that can pay their mortgages.
Theyre not doing this to predict who will be engaged as a customer.
All your predictive projects should be ones where you are saving or making money. Do
a project where you can demonstrate that your model worked and saved money on an
important measure. Often this is a line of business problem, not an HR problem.
Results are the only kind of win that will get business stakeholders excited and move
your efforts forward.
143
18. How Many of Us Think Big Data is Big BS? by Ali Syed
"You are thus right to note that one of the impetuses is that social as well as cultural,
economic and political consequences are not being attended to as the focus is primarily
on analytic and storage issues." Evelyn Ruppert, Editor Big Data and Society
144
At
the same time this data deluge is resulting in deep social, political and economic
consequences. What we are seeing is the ability to build economies form around the
data and that to me is the big change at a societal and even macroeconomic level. Data
has become the new raw material: an economic input almost on a par with capital and
labor.
Organizations need data from multiple systems to make decisions. Need data in easy to
understand, consistent format to enable fast understanding and reaction. They are now
trying to capture every click because storage is cheap. Customer base is harder to define
and constantly changing. While all this is happening expectation is to have the ability to
answer questions quickly. Everyone is saying Reports dont satisfy the need any
more.
The global economy has entered in the age of volatility and uncertainty; a faster pace
economic environment that shifts gears suddenly and unexpectedly. Product life cycles
are shorter and time to market is shorter. Instant gratification society, society which
expects quick answers and more flexibility more than ever. Consequently, the world of
145
business is always in the midst of a shift, required to deal with the changing economic
and social realities.
The combination of dealing with the complexities of the volatile digital world, data
deluge, and the pressing need to stay competitive and relevant has sharpened focus on
using data science within organizations. At organizations in every industry, in every
part of the world, business leaders wonder whether they are getting true value from the
monolithic amounts of data they already have within and outside their organizations.
New technologies, sensors and devices are collecting more data than ever before, yet
many organizations are still looking for better ways to obtain value from their data.
Strategic ability to analyze, predict and generate meaningful and valuable insights from
data is becoming top most priority of information leaders a.k.a CIOs. Organizations
need to know what is happening now, what is likely to happen next and, what actions
should be taken to get the optimal results. Behind rising expectations for deeper
insights and performance is a flood of data that has created an entirely new set of assets
just waiting to be applied. Businesses want deeper insights on the choices, buying
behaviors and patterns of their customers. They desire up to date understanding of
their operations, processes, functions, controls and seek information about the financial
health of their entire value chain, as well as the socio economic and environmental
consequences of both near term and distant events.
Every day I wake up and ask, how can I flow data better, manage data better, analyse data
better? Rollin Ford - CIO of Wal-Mart
Although business leaders have realized there's value in data, getting to that value has
remained a big challenge in most businesses. Friends in industry have cited many
challenges, and none can be discounted or minimized: executive sponsorship of data
science projects, combining disparate data sets, data quality and access, governance,
analytic talent and culture all matter and need to be addressed in time. In my
discussions with business executives, I have repeatedly heard that data science
initiatives aligned to a specific organizational challenge makes it easier to overcome a
wide range of obstacles.
146
Data promises so much to organizations that embrace it as essential element of their
strategy. Above all, it gives them the insights they need to make faster, smarter and
relevant decisions in a connected world where to understand and act in time means
survival. To derive value from data, organizations needs an integrated insight
ecosystem of people, process, technology and governance to capture and organize a
wide variety of data types from different sources, and to be able to easily analyze it
within the context of all the data.
We are all convinced that data as a fabric of the digital age underpins everything we do.
It's part and parcel of our digital existence, there is no escape from it. What is required
is that we focus on converting big data into useful date. We now have the tools and
capabilities to ask questions, challenge status quo and deliver meaningful value using
data. In my opinion, organizations and business leaders should focus more on how to
minimize the growing divide between those that realize the potential of data, and those
with the skills to process, analyze and predict from it. It's not about data, it's about
people. The real innovation in big data is human innovation.
"The truth is, that we need more, not less, data interpretation to deal with the onslaught of
information that constitutes big data. The bottleneck in making sense of the worlds most
intractable problems is not a lack of data, it is our inability to analyse and interpret it all." -
Christian Madsbjerg
147
19. So You Can Predict the Future. Big deal. Now Change It by Alex Cosmas
148
How can we isolate the effect of a drug treatment, a marketing campaign, or a policy
from observational data? We dont always have the opportunity to run a prospective
trial with randomized assignmentthe gold standard for causal interpretation.
Sometimes all we have to work with is whats already happened. The Neyman-Rubin
causal model provides us a baseline framework for causal inference, but given the
unavoidable imperfections of experimental design and data collection, we must address
the prevalence of covariates and confounders. We can treat covariates through
statistical matching methods, like propensity score matching or Mahalanobis distance
computation. Furthermore, Judea Pearls do-calculus enables the treatment of
confounders through statistical intervention.
149
If we dont have mutual information, proper temporalization, and a single cause, we
know we cant approach causality. But, as we recall from standardized exams, the
process of elimination often leads to the correct answer.
We, as data scientists, will continue to be called upon to solve the world's toughest
problems. We cannot get away with relying upon machine-learning to do machine-
thinking. To earn our place in the science community, we must become stewards of the
scientific methodof theory formulation and scientific reasoning. In other words, we
must become truth-seekers, not simply fact-seekers.
This year, over 1.4 billion smart phones will be shipped all
packed with sensors capable of collecting all kinds of data, not
to mention the data the users create themselves.
150
20. Best Practices in Lead Management and Use of Analytics by Marianne Seiler
151
Lead scoring problems often arise from either: (1) failure to apply analytics
comprehensively in lead scoring; (2) failure to create a holistic lead analytic record; (3)
or failure to apply data thoughtfully throughout the lead scoring process.
First, to be effective, analytics needs to be integrated into each of the five key steps in
lead scoring. When analytics is not fully present, scoring of leads is based largely on
business rules/judgement. The interpretation of these rules can be very loosely applied,
resulting in Sales having little faith in the process. Many times, sales re-qualifies the
lead when it is received from the lead nurturing process costing the company time and
money.
Second, firms need to develop a lead analytic record as the basis for their scoring
activities. Lead analytic records provide a rich set of information on all aspects of a
lead. This includes lead:
1. Opportunity: lead source, date, campaign, product, line of business, etc.
2. Contact: person generating the lead, their role/responsibility, education, number
of years with their firm, etc.
3. Company: company the lead contact works for, history as a customer, industry,
financial status, size, geography, etc.
4. Account Relationship: years as a customer, share of wallet, product held,
profitability of the account, etc.
152
Finally, companies need to bring external and internal data to the modeling process in a
thoughtful manner. Data has associated costs in collection, cleansing, matching and
potentially licensing. Methodically considering data elements and adding them when
they bring the most value improves model quality and lead management analytics ROI.
Attention to these key challenges will not only improve lead scoring effectiveness but
also enhance the performance of lead nurturing, distribution and pipeline optimization.
153
21. Integrated Information Architecture for Real-time Analytics by Madan Gopal
Along came Big Data and its exhilarating promise of read-only schema with massive,
scalable, parallel hardware architecture that would allow for all types of data and
information to flow at light speed to be accessed, analyzed and utilized for maximum
benefit at will. However, this illuminated ray of information and insight has to pass
through the gravity of existing data solar systems, data silo black holes and other
extremely powerful constraining forces that most definitely bend the information flow
and sometimes even stop it completely.
The Solution
154
The return of information technology as a profit center for an organization, depends
critically on its ability to provide accurate, timely answers to critical real-time questions
from executives and decision makers. However, the future of the organization itself
depends on its ability to learn in real-time, reasonably predict the near future trends,
and develop prescriptive solutions to service its stakeholders and customers.
The integrated information architecture has to address and balance the triple constraints
of scope (what information is needed?), quality (how to ensure its accuracy?) and cost
(how to deliver it most efficiently and sustainably?). However, the scope for this
architecture really focuses on what all the different data sources are that must be linked
and streamlined to answer the questions being asked most accurately. That is the real
crux of integration architecture.
The solution involves both technology and business process discipline as shown in
Figure 1.
155
Existing transactional and analytical data stores must be combined with new NoSQL
156
databases and Hadoop platform to allow old and new data to coexist simultaneously.
The architecture must allow for gateway service and message brokers for data
interfaces and exchanges, data mapping and conversion, and storage. It must also
include web and mobile applications for data collection and unstructured data from the
wbe. All these data must be governed and managed using Master Data Management
and reference data standards, data governance and change management. Despite its
importance, most executives admit their firms do a miserable job at managing sales
opportunities.
157
22. The Practice of Data Science: Investigating Data Scientists, Their Skills and
Team Makeup by Bob E. Hayes
158
Getting insights from data is no simple task, often requiring data science experts with a
variety of different skills. Many pundits have offered their take on what it takes to be a
successful data scientist. Required skills include expertise in business, technology and
statistics. In an interesting
study published by O'Reilly,
researchers (Harlan D.
Harris, Sean Patrick
Murphy and Marck
Vaisman) surveyed several
hundred practitioners,
asking them about their
proficiency in 22 different
data skills. Confirming the
pundits' list of skills, these
researchers found that data
skills fell into five broad
areas: Business, ML / Big
Data, Math / OR,
Programming and Statistics.
We invited data
professionals from a variety
of sources, including
AnalyticsWeek community
members and social media
(e.g., Twitter and LinkedIn),
to complete a short
survey, asking them about their proficiency across different data skills, education, job
roles, team members, satisfaction with their work outcomes and more. We received 490
completed survey responses.
Figure 1. Proficiency in Data Science Skills
Most of the respondents
were from North America (68%), worked for B2B companies (79%) with less than 1000
159
employees (53%) in the IT, Financial Services, Education/Science, Consulting and
Healthcare & Medicine (68%). Males accounted for 75% of the sample. A majority of the
respondents held 4-year (30%), Masters (49%) or PhD (18%) degrees.
Data science is an umbrella term, under which different skills fall. We identified 25 data
skills that make up the field of data
science. They fall into five
broad areas: 1) Business, 2)
Technology, 3) Programming, 4)
Math & Modeling and 5) Statistics.
Respondents were asked to indicate
their level of proficiency for each of
25 different skills, using a scale from
0 (Dont know) to (Expert).
Job Roles
160
Respondents were asked to indicate which of four options best described themselves
and the work they do (e.g., job role). Over half indicated their primary job role was a
Researcher, followed by Business
Management, Creative and Developer (see
Figure 2.).
161
Data Scientists are not Created
Equal
The results of the survey showed that data professionals tend to work together to solve
problems. Seventy-six percent of the respondents said they work with at least one other
person on projects that involve analytics.
162
To better understand how teams work together, we looked at how a data
professional's expertise impacts their teammate. We asked respondents how satisfied
they were with the outcomes of their analytics projects. Additionally, we asked data
professionals if their teammates were experts in any of the five data skill areas.
Results showed that Business Management professionals were more satisfied with the
outcome of their work when they had quantitative-minded experts on their team (e.g.,
Math & Modeling and Statistics) compared to when they did not (see Figure 5.).
Additionally, Researchers were more satisfied with their work outcome when they were
paired with experts in Business and Math & Modeling. Developers were more satisfied
with their work outcomes when paired with an expert in business. Creatives
satisfaction with their work product is not impacted by the presence of other experts.
Solving problems with data requires expertise across different skill areas: 1) Business, 2)
Technology, 3) Programming, 4) Math & Modeling and 5) Statistics.
163
Different types of data professionals (as defined by
their role) are proficient in different areas. Not
surprisingly, data professionals in Business
Management" roles are the most proficient in
business skills. Researchers are the most proficient
in Math & Modeling and Statistics skills.
Developers are the most proficient in Technology
and Programming. The Creative types have some
proficiency in all skill areas but are not the best in
any single one skill area.
164
mix of data professionals. Recruiters need to know the skills of their current data
professionals to effectively market to and recruit data professionals who have the right
skills to fill specific roles. Knowing the data science capabilities of data professionals is a
good first step to help organizations improve the value of their data.
About the Study and the Data Skills Scoring System (#DS3)
The current study was undertaken to better clarify and understand the role of data
science in the business world. Toward that end, we developed the Data Skills Scoring
System (DS3) to capture important information about data professionals and their work
environments. For data professionals, the DS3 provides free feedback about their skill
set. For chief data/analytics officers, the DS3 can provide an aggregated view of the
strengths of their data science teams and identify skill gaps. For recruiters, the DS3 can
help improve how they market to and recruit talent.
165
23. The Memefication of Insights by Tom De Ruyck
166
which turn insights into concrete ideas, stronger brands and future-proof business
concepts to deliver better consumer experiences. The million dollar question is: How do
we trigger these meaningful actions across the organization in order to create a positive
business impact? And how can the insight professional of tomorrow do this in an
efficient yet effective way?
For people to take action on a consumer insight, they first need to learn what the insight
is about. In traditional MR, only a limited group of people is involved in this knowledge
exchange by e.g. participating in the debrief workshop or managing the research study
themselves. This limited group is then able to shape an insight platform by adding own
thoughts, observations and/or ideas. By involving a wider group of employees, one
better understands the consumer and able to make better consumer relevant decisions.
Furthermore, the theory of open innovation teaches us that the one golden idea can
come from anywhere in the organization, not only marketing or innovation (Whelan,
2011). To increase the impact, all employees across the organization need to learn what
the friction is in order to share related observations and ideas. For example, by
experiencing how consumers are using their product today, employees see what can be
improved. When such an insight is replicated by employees by adding own
observations and ideas, it is shared with various people across the organization and it
triggers action, the insight is called a meme (Dawkins, 1989). An illustration of such a
meme is what we did at ATAG, a leading supplier of kitchen appliances. ATAG wanted
to move away from a product-driven strategy and introduce a consumer-driven
approach (cook-centered thinking). In order to make this shift, they needed to create
an internal belief for their new strategy. We invited 400 internal stakeholders to
discover the consumer insights and experience themselves how strong the emotion
passion for cooking can be. The #welovecooks experience engaged over 170
employees, who contributed 125 observations, and resulted in 13 potential internal
projects identified by the crowd. The new strategy was shared among employees and
turned into the #welovecooks meme.
To turn an insight into a meme, insight professionals need to move away from the
traditional research model and shift on 3 levels to establish the Memefication of
Insights:
167
1. From reporting to involving (#experience): While 92% of insight professionals
believes their research generates insight worth sharing with colleagues, only 65%
extensively shares them with their organization. Furthermore, only 1 in 5
researchers organizes interactive workshops to discuss results (Schillewaert et al,
2014). All too often, MR takes such an individualistic approach where executives
need to identify their own actions when reading research reports. However, to
trigger meaningful actions, insight professionals need to bring insights to life
through interaction. Therefore, we have identified 4 building blocks when
marketing insights; harvesting, seeding, activating and collaborating. Through
harvesting, we collect insights from internal stakeholders which are already
known. Secondly, seeding enables insights managers to spread insights via key
ambassadors in a relevant way through the organization. Activating triggers
stakeholders to not only discover but also interact with insights. Finally,
collaborating connects stakeholders to work together and turn insights into
actions and new future projects. At Unilever R&D the combination of these
building blocks lead to 640 involved employees of the 1000 invitees, which
triggered more conversations about their consumers on the work floor, measured
with an increase from 12% to 55%.
2. From teams to the organization (#reach): In traditional MR, consumer stories
and insights are often discovered and owned by the MR department. However,
in order to trigger meaningful actions, the insight needs to be co-owned by all
employees. First of all, we want extend the MR reach from executives to
management to enable higher management to take long-term decisions with a
consumer context in mind. Secondly, we involve the front-line employees, who
are in almost daily contact with consumers, to shape their consumer feeling and
ultimately improve their performance. Finally, involving all other employees that
have a rather indirect relationship with the consumer creates a better
understanding of the consumer context of the business, making them more
motivated as an employee in general. The extension of MR reach calls for a
layered approach, like we did for the Belgian bus company De Lijn where
involved their whole organization with consumer insights about their Gen Y
passenger. We seeded the top 10 insights during an internal conference with 200
168
top managers, we organized a speed date for executives to meet their consumers
and finally we activated all stakeholders to play the Gen Y passenger quiz to
interact with the key insights.
3. From projects to habit creation (#structural): For most employees, working with
consumer insights is not a routine. If you wish to trigger meaningful actions and
enable employees to turn the insight into a meme, it is of great importance that
consumer relevant inspirations are integrated into their daily jobs. By identifying
the employees motivations and behaviours, we can better trigger when and how
to use consumer insights on a regular basis. If we learn to shift towards habits,
we will be more successful in triggering meaningful actions and increase the
impact of consumer insights on the business. For Unilever R&D, we immersed
1000 employees with their consumer in 6 weeks time through testing their
consumer knowledge through mini-quizzes and organizing collaboration
sessions to close their knowledge gaps. As a result of integrating these consumer
insight routines, we not only improved their gut feeling but also shaped their
consumer feeling with a relative increase of 81% (De Ruyck et al, 2012).
169
24. The Esoteric Side of Data Analytics by Kiran Garimella
Data analysis, in its most nave sense, is the equivalent of good technique.
However, when the audience is expecting to hear classical music, rapping for them, no
matter how good, wont do the trick!
Seasoned data science professionals know that the most important task of data analytics
is asking the right question. The most time-consuming task is that of data gathering,
170
cleaning, and preparing. Its only then that the data scientist gets to apply classical
algorithms to glean insights that pertain to the question.
Unfortunately, the life-cycle of data analytics doesnt stop there. As a data scientist, you
did the analysis, diagnosed the problem, figured out the treatment, and wrote out the
prescription. If you think thats the end of it, think again.
You still have to communicate with the patient. You have to provide insights into the
nature of the solution. You or at least, someone has to make sure that the patient
takes the medications. Someone has to monitor the patient and do follow-ups.
A good doctor prescribes the medication and tailors the communication based on the
age and knowledge of the patient. Kids get the medicine in a syrupy base and small
doses. The doctor explains the cause of the problem and the side-effects of medications
in detail to the adult patient. The doctor uses medical jargon if the patient is also a
doctor.
As a data scientist, you too have to tailor your communication based on the role and
function of your audience. Non-technical senior managers arent interested in p-values
and correlation coefficients. They want to know what those statistics mean and how the
values affect their decisions.
You should help them choose among alternatives and make a decision, rather than just
throw a lot of conclusions at them. You also have to make sure they understand the
level of confidence in your analysis (which is a different than giving them confidence
intervals).
The scientifically valid analytical results may not be what the customer actually wants.
For example, you might come up with k=12 clusters as the optimal result of running the
k-means clustering algorithm on product satisfaction surveys. However, the marketing
manager says theres no budget to run 12 separate marketing programs and could you
please give them only 3 clusters?
The CFO says that funding marketing programs for two of the clusters is not feasible
because they happen to be focused on Europe where, due to the much more stringent
privacy laws, the regulatory process to obtain permissions to run new programs is cost-
prohibitive.
171
So, in all data analysis, the alignment with the business drivers and constraints becomes
critical. This is not to say that the science of data analysis should be driven by the
business, because that would be a corruption of technique a bit like demanding that a
survey be worded in such a way that the responses would be biased in favor of an
established position.
Rather, the data scientist has to explain the constraints and limitations of the results.
More than that, the data scientist has to explain the constraints and limitations of the
technique itself.
A truly outstanding data scientist will be able to present several options and explain
their pros, cons, risks, and also make recommendations on courses of action.
Going one step further, the data scientist must then keep a close watch on the actions
(for example, initiatives that improve the quality of the product and thereby seek to
increase the satisfaction ratings) and determine if the actions are generating the desired
results or not.
In many instances, the business people are best served when the data scientist tells a
story using the data. One good example is the story of the Facebook IPO
(http://www.nytimes.com/interactive/2012/05/17/business/dealbook/how-the-facebook-
offering-compares.html?_r=0). Another, arguably more powerful example, is the now
classic Ted Talk presentation by Hans Rosling on the Health and Wealth of Nations,
with more than 10 million views on YouTube (Seriously? 10+ million views of a video
on statistics!) Check it out:
https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen)
Let questions that are meaningful from a business perspective drive the analysis.
Business can include scientific (non-business) topics as well. The point is that
the questions that are designed to tease out insights need to drive the data
analysis. This principle is quite well-known to seasoned data scientists, but I
include it hear as a cautionary guideline to newbies.
Align the message with the capabilities of the audience. Data science comes with
all types of caveats and deep assumptions about technique (such as assumption
of normality, to give one obvious example; or the meaning of confidence
172
intervals, to give an example of a more commonly misinterpreted concept). It is
up to the scientist to guide the consumer of the results through these murky
waters.
Tell a story and connect with the audience.
If you thought that this was all there is to the hidden side of analytics, youd be sadly
mistaken. We are just getting started.
It turns out that we humans are very poor at reasoning with probability, statistics, and
convoluted chains of logic. We are subject to numerous cognitive biases. It turns out
that even well-trained statisticians and researchers succumb just as readily to these
biases as the layperson.
"Judgment under uncertainty: Heuristics and biases" by Kahneman, Slovic, and Tversky
(http://www.amazon.com/dp/0521284147) is a fascinating book that systematically
describes years of research in human psychology. The authors give examples of cleverly
constructed experiments that expose these innate biases.
We rely on heuristics to shorten the decision-making process, which is why they are
called heuristics. Psychologists classify these heuristics into various categories such as
173
representativeness, availability, anchoring or adjustment, and so on. Wikipedia lists
over 170 biases of all kinds.
The availability heuristic includes biases due to imaginability, illusive correlation, and
so on.
Here are a few examples of biases to demonstrate how juicy and taken in the right
spirit, entertaining - this topic can be:
Researchers put their subjects into two groups in separate rooms. One group is told to
estimate the product 1 x 2 x 3 x 4 and so on down to 10 (without doing the actual
calculation). The other group is told to estimate the product 10 x 9 x 8 x 7 and so on up
to 1 (again, without actually calculating or using any device).
It turns out that the first group systematically underestimates the product while the
second group systematically overestimates the product. This is of course driven by the
magnitude of the sub-product formed by quick mental calculation of the first few
factors. This is an anchoring bias.
Heres one example from mathematics: Is the set of transcendental numbers finite or
infinite? Transcendental numbers, for example, are and e (base of the natural
logarithm).
If you were a lay math buff, you might be able to provide those two examples of
transcendental numbers. Trained mathematicians might be able to provide a few more.
From the paucity of examples of transcendental numbers, youd be tempted to conclude
that the number of transcendental numbers is finite.
Boy, would you be wrong! Not only are transcendental numbers infinite, there are an
uncountably infinite number of them. In other words, you couldnt even count them, as
you would the natural numbers such as 1, 2, 3, 4, etc., which, as you know, are
countably infinite. In other words, the infinity of transcendental numbers is bigger than
the infinity of natural numbers.
174
Concluding that there are only a finite number of transcendental numbers because you
could only recall a few examples of them is due to the heuristic bias of irretrievability of
instances.
As if these biases were not enough, we humans also fall prey to several logical fallacies.
These range from the usually colorful fallacy of argumentum ad hominem (attack the
person instead of the argument) to the broken window fallacy (the favorite of those who
believe that wars stimulate the economy).
Most of these logical fallacies have profound-sounding Latin names. If you manage to
memorize these, you can browbeat your opponents in debate regardless of the merit of
your argument by spouting these Latin names, in which case youd be committing the
fallacy of argumentum ad verecundiam (argument from authority or shaming the other
person with display of erudition).
The main lessons that Kahneman et. al draw from their landmark research are the
following:
You might argue that these biases cant be all that critical since the human race hasnt
done too badly so far. However, the increasing complexity of our world (and it can only
get more complex), the enormous quantity of data (which can only get bigger), and the
pressure to digest all that data and complexity to make decisions or act faster and faster
invests these errors with greater power to do damage through compounding.
So, the esotericism in the art of data analysis lies in the need to build awareness and
train data scientists first and users next on the limitations of analytical models, biases in
judgmental heuristics, and tendencies towards logical fallacies.
What are some immediate actions that a data scientist can take?
175
Be more diligent in applying the right analytical models
Present caveats, assumptions, and limitations along with the results and cool
visualizations
Give the consumers of analysis choices of action or interpretation
Explain the risks
Interpret the results in business-speak; know your audience and tell a story
In order to mature as a discipline, data science must evolve to include in its formidable
arsenal of tools those techniques that protect us from our own biases and tendencies to
fallacies. These protective techniques can come from guided analysis, special tools that
watch out for such errors, good analytic governance mechanisms, as well as the
inclusion of the study of these biases and fallacies in a data science curriculum.
176
SECTION C: FUTURE OF ANALYTICS
While it is important for us to understand how analytics are used today, it is also
important for us to know where the data analytics world is headed. The contributing
authors provide amazing perspectives on what the future of data and analytics entails.
To stay on the leading edge, we need to have our own version of what our future of
data analytics looks like. This section is a great start to see what things we should be
thinking about for the future of analytics and how we can plan to get there.
The best way to approach this section is to read each article in isolation. Before reading
each article, we suggest that you first read the authors biography. Knowing the
authors background can give you the right context to understand their perspective.
177
1. Full Stack Analytics The Next Wave of Opportunity in Big Data by Chip
Hazard
Over the past few years, billions of dollars of venture capital funding has flowed into
Big Data infrastructure companies that help organizations store, manage and analyze
unprecedented levels of data. The recipients of this capital include Hadoop vendors
such as Cloudera, HortonWorks and MapR; NoSQL database providers such as
MongoDB (a Flybridge portfolio company where I sit on the board), DataStax and
Couchbase; BI Tools, SQL on Hadoop and Analytical framework vendors such as
Pentaho, Japsersoft Datameer and Hadapt. Further, the large incumbent vendors such
as Oracle, IBM, SAP, HP, EMC, Tibco and Informatica are plowing significant R&D and
M&A resources into Big Data infrastructure. The private companies are attracting
capital and the larger companies are dedicating resources to this market given an
overall market that is both large, ($18B in spending in 2013 by one estimate) and
growing quickly (to $45B by 2016, or a CAGR of 35% by the same estimate) as shown in
the chart below:
178
While significant investment and revenue dollars are flowing into the Big Data
infrastructure market today, on a forward looking basis, we believe the winners in these
markets have largely been identified and well-capitalized and that opportunities for
new companies looking to take advantage of these Big Data trends lie elsewhere,
specifically in what we at Flybridge call Full-Stack Analytics companies. A Full-Stack
analytics company can be defined as follows:
1. They marry all the advances and innovation developing in the infrastructure
layer from the vendors noted above to
3. Derive unique, and actionable insights from the data to solve real business
problems in a way that
179
4. Benefits from significant data "network effects" such that the quality of their
insights and solutions improve in a non-linear fashion over time as they amass more
data and insights.
Two points from the above criteria that are especially worth calling out are the concepts
of actionable insights and data network effects. On the former, one of the recurring
themes we hear from CIOs and LIne of Business Heads at large companies is that they
are drowning in data, but suffering from a paucity of insights that change decisions
they make. As a result, it is critical to boil the data down into something that can be
acted upon in a reasonable time frame to either help companies generate more revenue,
serve their customers better or operate more efficiently. On the latter, one of the most
180
important opportunities for Full-Stack analytics companies is to use machine learning
techniques (an area my partner, Jeff Bussgang, has written about) to develop a set of
insights that improve over time as more data is analyzed across more customers in
effect, learning the business context with greater data exposure to drive better insights
and, therefore, better decisions. This provides not only an increasingly more
compelling solution but also allows the company to develop competitive barriers that
get harder to surmount over time. In other words, this approach creates a network
effect where the more data you ingest, the more learning ensues which leads to better
decisions and opportunities to ingest yet even more data.
2. assemble a team with unique domain insights into this market and how data can
drive differentiated decisions and have the requisite combination of technical skills to
develop and;
If your company can follow this recipe for success, you will find your future as a Full-
Stack Analytics provider to be very bright!
One of the earliest references to data analysis was by John Tukey in 1961, who while
writing on the The Future of Data Analysis (Source: The Annals of Mathematical
Statistics, 1961) made the following observation:
For a long time I have thought I was a statistician, interested in inferences from the
particular to the general. But as I have watched mathematical statistics evolve, I have
had cause to wonder and to doubt.
All in all, I have come to feel that my central interest is in data analysis, which I take to
include, among other things: procedures for analyzing data, techniques for interpreting
182
the results of such procedures, ways of planning the gathering of data to make its
analysis easier, more precise or more accurate, and all the machinery and results of
(mathematical) statistics which apply to analyzing data.
Having defined data analysis, John Tukey in the same paper lays down the future of
data analysis and the rationale for why data analysis should be considered a scientific
discipline based on observations and experiments and not as mathematical discipline.
As the use of computers to store, manage, and analyze data grew during the second half
of the twentieth century the field of data analysis flourished at the intersection of
statistics and computer science.
Early days of data analysis, was focused more on describing, summarizing, and
synthesizing the data. It explained what happened (descriptive analytics) and why it
happened (diagnostic analytics). For example, synthesizing the sales data for the past
month, comparing it with the year-to-date actuals or the same month in the previous
year and showing some summary statistics would be considered descriptive analytics.
Making further inferences on the variability of the sales data, where the sales went up
or down and why would be diagnostic analytics. A number of exploratory statistical
techniques and root-cause analysis techniques are employed in these two types of
analytics.
Over time, we have moved from understanding the past to pro-actively predicting the
future, and taking the right actions. Predictive analytics, and prescriptive analytics are
some of the terms most commonly used in the literature to describe these forms of
analytics.
Predictive analytics has a fairly broad definition in the press but has a specific meaning
in academic circles. Classical predictive analytics focuses on building predictive models
where a subset of the available data is used to build a model using statistical techniques
(usually some form of regression analysis linear, logistic regression etc.) that is then
tested for its accuracy with the holdout sample. Once a model with sufficient
accuracy is developed, it can be used to predict future outcomes. For example,
predicting the future sales of the product and the propensities for different customer
segments to buy the product would be predictive analytics.
183
Prescriptive analytics goes beyond predictive analytics by evaluating multiple decisions
into the future and automating the recommendation or action. For example, evaluating
customer propensity to buy, desired sales target, and response to discounted offers, the
system can automate the marketing campaign and make targeted offers to customers.
This would fall under the banner of prescriptive analytics.
In the future, we would see more types of analytics emerge and co-exist with the ones
that we have described.
Real-time Analytics: Gathering real-time data, combining that with static data,
recommending or executing actions autonomously all fall under this category of real-
time analytics. For example, analyzing the click-stream behavior of an online customer,
comparing it with their historical interaction, and making tailored online offers while
the customers are still browsing will fall under real-time analytics.
184
So far, we have explored the different types of analytics that are being developed today
and will be developed in the future. Now we shift our focus to the advances in the
analytic techniques that will make these possible.
185
availability of multi-core GPU (Graphical Processing Units) machines have accelerated
the advances in machine learning. A particular type of machine learning called deep
learning exploits multi-layer networks of threshold units, each of which computes some
simple parameterized function of its inputs (Source: Machine Learning: Trends,
Perspectives, and Prospects. July 17, 2015, Science, Vol 349, Issue 6245). In the past five
years, these deep learning systems have been successfully used in a number of image,
voice, and text learning situations with good levels of accuracy. In the future, these
techniques will be further refined and as the computing power and memory capacity
continue to increase will find broader applications.
In between simple and random systems are complex systems that consist of several
things that interact with each other in meaningful ways that change their future path.
For example, a collection of consumers watching advertisements, talking to others and
using products can influence other consumers, companies and the economy as a whole.
Complexity science rejects the notion of independence and actively models the
interactions of entities that make up the system.
Complexity science identifies seven core traits of entities and how they relate to each
other: 1) information processing, 2) non-linear relationships, 3) emergence, 4) evolution,
5) self-organization, 6) robustness and 7) if they are on the edge of chaos. Unlike a
random system, the entities in a complex system process information and make
decisions. These information processing units influence each other, which results in
186
positive or negative feedback leading to non-linear relationships. As a result, properties
emerge from the interaction of the entities that did not originally characterize the
individual entities. For example, when a new product comes on the market, consumers
may purchase it not just because of its intrinsic value but also because of its real or
perceived influence on others. Moreover, the interactions between entities in a complex
system are not static; they evolve over time. They are capable of self-organizing and
lack a central controlling entity. These conditions lead to more adaptive behavior. Such
systems are often at the edge of chaos but are not quite chaotic or entirely random.
The types of analytics and the analytical techniques that we outlined will be applied in a
number of diverse applications and will fundamentally result in a new wave of
187
automation. Since the advent of computers in the 1950s and the era of personal
computers in the 80s and mobile computing in the 2000s we have been automating
business processes across all areas of an organization from marketing, to distribution,
to operations, to back-office finance and supply chain functions. However, in most cases
the emphasis has been automating mundane tasks, while still leaving the human-in-the-
loop to make the decisions. We are just at the beginning of the next wave of assistance,
automation, and enhancement of human decision-making.
188
a stage where the machines will be able to make better, faster and smarter decisions
decisions that we will be unable to make on our own. For example, AI systems today
can analyze publicly available documents, perform semantic clustering to identify
sectors with intense competition and white spaces for growth opportunities. VCs can
then use this information to selectively look for disruptive sectors and companies. As
the amount of data doubles every few years, such decision making will become
inevitable and become the primary source of competitive advantage.
189
3. Computational Knowledge and the Future of Pure Mathematics by Stephen
Wolfram
Twenty-four years later, the vast majority of the worlds pure mathematicians do in fact
use Mathematica in one way or another. But theres nevertheless a substantial core of
190
pure mathematics that still gets done pretty much the same way its been done for
centuriesby hand, on paper.
Ever since the 1990 ICM Ive been wondering how one could successfully inject
technology into this. And Im excited to say that I think Ive recently begun to figure it
out. There are plenty of details that I dont yet know. And to make what Im imagining
real will require the support and involvement of a substantial set of the worlds pure
mathematicians. But if its done, I think the results will be spectacularand will surely
change the face of pure mathematics at least as much as Mathematica (and for a
younger generation, Wolfram|Alpha) have changed the face of calculational
mathematics, and potentially usher in a new golden age for pure mathematics.
Workflow of pure math The whole story is quite complicated. But for me one
important starting point is the difference in the typical workflows for calculational
mathematics and pure mathematics. Calculational mathematics tends to involve setting
up calculational questions, and then working through them to get resultsjust like in
typical interactive Mathematica sessions. But pure mathematics tends to involve taking
mathematical objects, results or structures, coming up with statements about them, and
then giving proofs to show why those statements are true.
How can we usefully insert technology into this workflow? Heres one simple way.
Think about Wolfram|Alpha. If you enter 2+2, Wolfram|Alphalike Mathematica
will compute 4. But if you enter new yorkor, for that matter, 2.3363636 or cos(x)
log(x)theres no single answer for it to compute. And instead what it does is to
generate a report that gives you a whole sequence of interesting facts about what you
entered.
191
And this kind of thing fits right into the workflow for pure mathematics. You enter
some mathematical object, result or structure, and then the system tries to tell you
interesting things about itjust like some extremely wise mathematical colleague
might. You can guide the system if you want to, by telling it what kinds of things you
want to know about, or even by giving it a candidate statement that might be true. But
the workflow is always the Wolfram|Alpha-like what can you tell me about that?
rather than the Mathematica-like whats the answer to that?
Wolfram|Alpha already does quite a lot of this kind of thing with mathematical objects.
Enter a number, or a mathematical expression, or a graph, or a probability distribution,
192
or whatever, and Wolfram|Alpha will use often-quite-sophisticated methods to try to
tell you a collection of interesting things about it.
But to really be useful in pure mathematics, theres something else thats needed. In
addition to being able to deal with concrete mathematical objects, one also has to be
able to deal with abstract mathematical structures.
Countless pure mathematical papers start with things like, Let F be a field with such-
and-such properties. We need to be able to enter something like thisthen have our
system automatically give us interesting facts and theorems about F, in effect creating a
whole automatically generated paper that tells the story of F.
193
to represent geometries, or equations, or stochastic processes or quantifiers. But whats
not built in are representations of pure mathematical concepts like bijections or abstract
semigroups or pullbacks.
Mathematica Pura Over the years, plenty of mathematicians have implemented specific
cases. But could we systematically extend the Wolfram Language to cover the whole
range of pure mathematicsand make a kind of Mathematica Pura? The answer is
unquestionably yes. Itll be fascinating to do, but itll take lots of difficult language
design.
Ive been doing language design now for 35 yearsand its the hardest intellectual
activity I know. It requires a curious mixture of clear thinking, aesthetics and pragmatic
judgement. And it involves always seeking the deepest possible understanding, and
trying to do the broadest unificationto come up in the end with the cleanest and
most obvious primitives to represent things.
One might think that somehow mathematical notation would already have solved the
whole problem. But theres actually only a quite small set of constructs and concepts
that can be represented with any degree of standardization in mathematical notation
and indeed many of these are already in the Wolfram Language.
So how should one go further? The first step is to understand what the appropriate
primitives are. The whole Wolfram Language today has about 5000 built-in functions
together with many millions of built-in standardized entities. My guess is that to
broadly support pure mathematics there would need to be something less than a
thousand other well-designed functions that in effect define frameworkstogether with
maybe a few tens of thousands of new entities or their analogs.
194
Take something like function spaces. Maybe therell be a FunctionSpace function to
represent a function space. Then therell be various operations on function spaces, like
PushForward or MetrizableQ. Then therell be lots of named function spaces, like
CInfinity, with various kinds of parameterizations.
One wants to have short notations for some of the most common structural or
connective elements. But one needs the right number: not too few, like in LISP, nor too
many, like in APL. Then one wants to have function names made of ordinary words,
arranged so that if ones given something written in the language one can effectively
just read the words to know at least roughly whats going on in it.
Computers & humans But in the modern Wolfram Language world theres also free-
form natural language. And the crucial point is that by using this, one can leverage all
the various convenientbut sloppynotations that actual mathematicians use and find
familiar. In the right context, one can enter L2 for Lebesgue Square Integrableand
195
the natural language system will take care of disambiguating it and inserting the
canonical symbolic underlying form.
Ultimately every named construct or concept in pure mathematics needs to have a place
in our symbolic language. Most of the 13,000+ entries in MathWorld. Material from the
5600 or so entries in the MSC2010 classification scheme. All the many things that
mathematicians in any given field would readily recognize when told their names.
But, OK, so lets say we manage to create a precise symbolic language that captures the
concepts and constructs of pure mathematics. What can we do with it?
One thing is to use it Wolfram|Alpha style: you give free-form input, which is then
interpreted into the language, and then computations are done, and a report is
generated.
When I write programs in the Wolfram Language, I pretty much think directly in the
language. Im not coming up with a description in English of what Im trying to do and
then translating it into the Wolfram Language. Im forming my thoughts from the
beginning in the Wolfram Languageand making use of its structure to help me define
those thoughts.
If we can develop a sufficiently good symbolic language for pure mathematics, then itll
provide something for pure mathematicians to think in too. And the great thing is that
if you can describe what youre thinking in a precise symbolic language, theres never
any ambiguity about what anything means: theres a precise definition that you can just
go to the documentation for the language to find.
And once pure math is represented in a precise symbolic language, it becomes in effect
something on which computation can be done. Proofs can be generated or checked.
Searches for theorems can be done. Connections can automatically be made. Chains of
prerequisites can automatically be found.
196
But, OK, so lets say we have the raw computational substrate we need for pure
mathematics. How can we use this to actually implement a Wolfram|Alpha-like
workflow where we enter descriptions of things, and then in effect automatically get
mathematical wisdom about them?
There are two seemingly different directions one can go. The first is to imagine
abstractly enumerating possible theorems about what has been entered, and then using
heuristics to decide which of them are interesting. The second is to start from
computable versions of the millions of theorems that have actually been published in
the literature of mathematics, and then figure out how to connect these to whatever has
been entered.
Each of these directions in effect reflects a slightly different view of what doing
mathematics is about. And theres quite a bit to say about each direction.
Math by enumeration Lets start with theorem enumeration. In the simplest case, one
can imagine starting from an axiom system and then just enumerating true theorems
based on that system. There are two basic ways to do this. The first is to enumerate
possible statements, and then to use (implicit or explicit) theorem-proving technology
to try to determine which of them are true. And the second is to enumerate possible
proofs, in effect treeing out possible ways the axioms can be applied to get theorems.
Its easy to do either of these things for something like Boolean algebra. And the result
is that one gets a sequence of true theorems. But if a human looks at them, many of
them seem trivial or uninteresting. So then the question is how to know which of the
possible theorems should actually be considered interesting enough to be included in
a report thats generated.
My first assumption was that there would be no automatic approach to thisand that
interestingness would inevitably depend on the historical development of the
relevant area of mathematics. But when I was working on A New Kind of Science, I did
a simple experiment for the case of Boolean algebra.
197
There are 14 theorems of Boolean algebra that are usually considered interesting
enough to be given names in textbooks. I took all possible theorems and listed them in
order of complexity (number of variables, number of operators, etc). And the surprising
thing I found is that the set of named theorems corresponds almost exactly to the set of
theorems that cant be proved just from ones that precede them in the list. In other
words, the theorems which have been given names are in a sense exactly the minimal
statements of new information about Boolean algebra.
Boolean algebra is of course a very simple case. And in the kind of enumeration I just
described, once ones got the theorems corresponding to all the axioms, one would
conclude that there arent any more interesting theorems to findwhich for many
mathematical theories would be quite silly. But I think this example is a good indication
198
of how one can start to use automated heuristics to figure out which theorems are
worth reporting on, and which are, for example, just uninteresting embellishments.
So in principle one can imagine having a system that takes input and generates
interesting theorems about it. Notice that while in a standard Mathematica-like
calculational workflow, one would be taking input and computing an answer from it,
here ones just finding interesting things to say about it.
The character of the input is different too. In the calculational case, ones typically
dealing with an operation to be performed. In the Wolfram|Alpha-like pure
mathematical case, ones typically just giving a description of something. In some cases
that description will be explicit. A specific number. A particular equation. A specific
graph. But more often it will be implicit. It will be a set of constraints. One will say (to
use the example from above), Let F be a field, and then one will give constraints that
the field must satisfy.
199
In a sense an axiom system is a way of giving constraints too: it doesnt say that such-
and-such an operator is Nand; it just says that the operator must satisfy certain
constraints. And even for something like standard Peano arithmetic, we know from
Gdels Theorem that we can never ultimately resolve the constraintswe can never nail
down that the thing we denote by + in the axioms is the particular operation of
ordinary integer addition. Of course, we can still prove plenty of theorems about +,
and those are what we choose from for our report.
One day Im sure doing this will be an important part of pure mathematical work. But
as of now it will seem quite alien to most pure mathematiciansbecause they are not
used to disembodied theorems; they are used to theorems that occur in papers,
written by actual mathematicians.
And this brings us to the second approach to the automatic generation of mathematical
wisdom: start from the historical corpus of actual mathematical papers, and then make
connections to whatever specific input is given. So one is able to say for example, The
following theorem from paper X applies in such-and-such a way to the input you have
given, and so on.
Curating the math corpus So how big is the historical corpus of mathematics? Thereve
probably been about 3 million mathematical papers published altogetheror about 100
million pages, growing at a rate of about 2 million pages per year. And in all of these
papers, perhaps 5 million distinct theorems have been formally stated.
So what can be done with these? First, of course, theres simple search and retrieval.
Often the words in the papers will make for better search targets than the more
notational material in the actual theorems. But with the kind of linguistic-
understanding technology for math that we have in Wolfram|Alpha, it should not be
too difficult to build whats needed to do good statistical retrieval on the corpus of
mathematical papers.
But can one go further? One might think about tagging the source documents to
improve retrieval. But my guess is that most kinds of static tagging wont be worth the
200
trouble; just as ones seen for the web in general, itll be much easier and better to make
the search system more sophisticated and content-aware than to add tags document by
document.
So lets imagine we curate all the theorems from the literature of mathematics, and get
them in computable form. What would we do then? We could certainly build a
Wolfram|Alpha-like system that would be quite spectacularand very useful in
practice for doing lots of pure mathematics.
And what this suggests is a kind of combination of the two basic approaches weve
discussedwhere in effect one takes the complete corpus of published mathematics,
and views it as defining a giant 5-million-axiom formal system, and then follows the
kind of automated theorem-enumeration procedure we discussed to find interesting
things to say.
Math: science or art? So, OK, lets say we build a wonderful system along these lines. Is
it actually solving a core problem in doing pure mathematics, or is it missing the point?
201
I think it depends on what one sees the nature of the pure mathematical enterprise as
being. Is it science, or is it art? If its science, then being able to make more theorems
faster is surely good. But if its art, thats really not the point. If doing pure mathematics
is like creating a painting, automation is going to be largely counterproductive
because the core of the activity is in a sense a form of human expression.
This is not unrelated to the role of proof. To some mathematicians, what matters is just
the theorem: knowing whats true. The proof is essentially backup to ensure one isnt
making a mistake. But to other mathematicians, proof is a core part of the content of the
mathematics. For them, its the story that brings mathematical concepts to light, and
communicates them.
It has 343 steps, and in ordinary-size type would be perhaps 40 pages long. And to me
as a human, its completely incomprehensible. One might have thought it would help
202
that the theorem prover broke the proof into 81 lemmas. But try as I might, I couldnt
really find a way to turn this automated proof into something I or other people could
understand. Its nice that the proof exists, but the actual proof itself doesnt tell me
anything.
Proof as story And the problem, I think, is that theres no conceptual story around the
elements of the proof. Even if the lemmas are chosen structurally as good
waypoints in the proof, there are no cognitive connectionsand no historyaround
these lemmas. Theyre just disembodied, and apparently disconnected, facts.
So how can we do better? If we generate lots of similar proofs, then maybe well start
seeing similar lemmas a lot, and through being familiar they will seem more
meaningful and comprehensible. And there are probably some visualizations that could
help us quickly get a good picture of the overall structure of the proof. And of course, if
we manage to curate all known theorems in the mathematics literature, then we can
potentially connect automatically generated lemmas to those theorems.
Its not immediately clear how often that will possibleand indeed in existing
examples of computer-assisted proofs, like for the Four Color Theorem, the Kepler
Conjecture, or the simplest universal Turing machine, my impression is that the often-
computer-generated lemmas that appear rarely correspond to known theorems from
the literature.
But despite all this, I know at least one example showing that with enough effort, one
can generate proofs that tell stories that people can understand: the step-by-step
solutions system in Wolfram|Alpha Pro. Millions of times a day students and others
compute things like integrals with Wolfram|Alphathen ask to see the steps.
203
Its notable that actually computing the integral is much easier than figuring out good
steps to show; in fact, it takes some fairly elaborate algorithms and heuristics to
generate steps that successfully communicate to a human how the integral can be done.
But the example of step-by-step in Wolfram|Alpha suggests that its at least conceivable
that with enough effort, it would be possible to generate proofs that are readable as
storiesperhaps even selected to be as short and simple as possible (proofs from
The Book, as Erds would say).
204
too. Because in an effort to ensure rigor and precision, many papers tend to be written
in a very formal way that cannot successfully represent the underlying ideas and
motivations in the mind of the authorwith the result that some of the most important
ideas in mathematics are transmitted through an essentially oral tradition.
It would certainly help the progress of pure mathematics if there were better ways to
communicate its content. And perhaps having a precise symbolic language for pure
mathematics would make it easier to express concretely some of those important points
that are currently left unwritten. But one thing is for sure: having such a language
would make it possible to take a theorem from anywhere, andlike with a typical
Wolfram Language code fragmentimmediately be able to plug it in anywhere else,
and use it.
But back to the question of whether automation in pure mathematics can ultimately
make sense. I consider it fairly clear that a Wolfram|Alpha-like pure math assistant
would be useful to human mathematicians. I also consider it fairly clear that having a
good, precise, symbolic languagea kind of Mathematica Pura thats a well-designed
follow-on to standard mathematical notationwould be immensely helpful in
formulating, checking and communicating math.
Automated discovery But what about a computer just going off and doing math by
itself? Obviously the computer can enumerate theorems, and even use heuristics to
select ones that might be considered interesting to human mathematicians. And if we
curate the literature of mathematics, we can do extensive empirical metamathematics
and start trying to recognize theorems with particular characteristics, perhaps by
applying graph-theoretic criteria on the network of theorems to see what counts as
surprising or a powerful theorem. Theres also nothing particularly difficultlike
in WolframTonesabout having the computer apply aesthetic criteria deduced from
studying human choices.
But I think the real question is whether the computer can build up new conceptual
frameworks and structuresin effect new mathematical theories. Certainly some
theorems found by enumeration will be surprising and indicative of something
fundamentally new. And it will surely be impressive when a computer can take a large
collection of theoremswhether generated or from the literatureand discover
correlations among them that indicate some new unifying principle. But I would expect
205
that in time the computer will be able not only to identify new structures, but also name
them, and start building stories about them. Of course, it is for humans to decide
whether they care about where the computer is going, but the basic character of what it
does will, I suspect, be largely indistinguishable from many forms of human pure
mathematics.
All of this is still fairly far in the future, but theres already a great way to discover
math-like things todaythats not practiced nearly as much as it should be:
experimental mathematics. The term has slightly different meanings to different people.
For me its about going out and studying what mathematical systems do by running
experiments on them. And so, for example, if we want to find out about some class of
cellular automata, or nonlinear PDEs, or number sequences, or whatever, we just
enumerate possible cases and then run them and see what they do.
Theres a lot to discover like this. And certainly its a rich way to generate observations
and hypotheses that can be explored using the traditional methodologies of pure
mathematics. But the real thrust of what can be done does not fit into what pure
mathematicians typically think of as math. Its about exploring the flora and fauna
and principlesof the universe of possible systems, not about building up math-like
structures that can be studied and explained using theorems and proofs. Which is
whyto quote the title of my bookI think one should best consider this a new kind of
science, rather than something connected to existing mathematics.
This is particularly obvious when ones out in the computational universe of possible
programs, but its also true for programs that represent typical mathematical systems.
So why isnt undecidability more of a problem for typical pure mathematics? The
answer is that pure mathematics implicitly tends to select what it studies so as to avoid
undecidability. In a sense this seems to be a reflection of history: pure mathematics
follows what it has historically been successful in doing, and in that way ends up
navigating around undecidabilityand producing the millions of theorems that make
up the corpus of existing pure mathematics.
206
OK, so those are some issues and directions. But where are we at in practice in bringing
computational knowledge to pure mathematics?
Getting it done Theres certainly a long history of related efforts. The works of Peano
and Whitehead and Russell from a century ago. Hilberts program. The development of
set theory and category theory. And by the 1960s, the first computer systemssuch as
Automathfor representing proof structures. Then from the 1970s, systems like Mizar
that attempted to provide practical computer frameworks for presenting proofs. And in
recent times, increasingly popular proof assistants based on systems like Coq and
HOL.
One feature of essentially all these efforts is that they were conceived as defining a kind
of low-level language for mathematics. Like most of todays computer languages,
they include a modest number of primitives, then imagine that essentially any actual
content must be built externally, by individual users or in libraries.
But the new idea in the Wolfram Language is to have a knowledge-based language, in
which as much actual knowledge as possible is carefully designed into the language
itself. And I think that just like in general computing, the idea of a knowledge-based
language is going to be crucial for injecting computation into pure mathematics in the
most effective and broadly useful way.
The emphasis of the math in Mathematica and the Wolfram Language today is on
practical, calculational, math. And by now it certainly covers essentially all the math
that has survived from the 19th century and before. But what about more recent math?
Historically, math itself went through a transition about a century ago. Just around the
time modernism swept through areas like the arts, math had its own version: it started
207
to consider systems that emerged purely from its own formalism, without regard for
obvious connection to the outside world.
And this is the kind of maththrough developments like Bourbaki and beyondthat
came to dominate pure mathematics in the 20th century. And inevitably, a lot of this
math is about defining abstract structures to study. In simple cases, it seems like one
might represent these structures using some hierarchy of types. But the types need to be
parametrized, and quite quickly one ends up with a whole algebra or calculus of
typesand its just as well that in the Wolfram Language one can use general symbolic
expressions, with arbitrary heads, rather than just simple type descriptions.
As I mentioned early in this blog post, its going to take all sorts of new built-in
functions to capture the frameworks needed to represent modern pure mathematics
together with lots of entity-like objects. And itll certainly take years of careful design to
make a broad system for pure mathematics thats really clean and usable. But theres
nothing fundamentally difficult about having symbolic constructs that represent
differentiability or moduli spaces or whatever. Its just language design, like designing
ways to represent 3D images or remote computation processes or unique external entity
references.
So what about curating theorems from the literature? Through Wolfram|Alpha and the
Wolfram Language, not to mention for example the Wolfram Functions Site and the
Wolfram Connected Devices Project, weve now had plenty of experience at the process
of curation, and in making potentially complex things computable.
The eCF example But to get a concrete sense of whats involved in curating
mathematical theorems, we did a pilot project over the last couple of years through the
Wolfram Foundation, supported by the Sloan Foundation. For this project we picked a
very specific and well-defined area of mathematics: research on continued fractions.
Continued fractions have been studied continually since antiquity, but were at their
most popular between about 1780 and 1910. In all there are around 7000 books and
papers about them, running to about 150,000 pages.
We chose about 2000 documents, then set about extracting theorems and other
mathematical information from them. The result was about 600 theorems, 1500 basic
formulas, and about 10,000 derived formulas. The formulas were directly in computable
208
formand were in effect immediately able to join the 300,000+ on the Wolfram
Functions Site, that are all now included in Wolfram|Alpha. But with the theorems, our
first step was just to treat them as entities themselves, with properties such as where
they were first published, who discovered them, etc. And even at this level, we were
able to insert some nice functionality into Wolfram|Alpha.
209
But we also started trying to actually encode the content of the theorems in computable
form. It took introducing some new constructs like LebesgueMeasure, ConvergenceSet
and LyapunovExponent. But there was no fundamental problem in creating precise
symbolic representations of the theorems. And just from these representations, it
became possible to do computations like this in Wolfram|Alpha:
210
An interesting feature of the continued fraction project (dubbed eCF) was how the
process of curation actually led to the discovery of some new mathematics. For having
done curation on 50+ papers about the RogersRamanujan continued fraction, it became
clear that there were missing cases that could now be computed. And the result was the
filling of a gap left by Ramanujan for 100 years.
Theres always a tradeoff between curating knowledge and creating it afresh. And so,
for example, in the Wolfram Functions Site, there was a core of relations between
functions that came from reference books and the literature. But it was vastly more
efficient to generate other relations than to scour the literature to find them.
211
But if the goal is curation, then what would it take to curate the complete literature of
mathematics? In the eCF project, it took about 3 hours of mathematician time to encode
each theorem in computable form. But all this work was done by hand, and in a larger-
scale project, I am certain that an increasing fraction of it could be done automatically,
212
not least using extensions of our Wolfram|Alpha natural language understanding
system.
Of course, there are all sorts of practical issues. Newer papers are predominantly in
TeX, so its not too difficult to pull out theorems with all their mathematical notation.
But older papers need to be scanned, which requires math OCR, which has yet to be
properly developed.
Then there are issues like whether theorems stated in papers are actually valid. And
even whether theorems that were considered valid, say, 100 years ago are still
considered valid today. For example, for continued fractions, there are lots of pre-1950
theorems that were successfully proved in their time, but which ignore branch cuts, and
so wouldnt be considered correct today.
And in the end of course it requires lots of actual, skilled mathematicians to guide the
curation process, and to encode theorems. But in a sense this kind of mobilization of
mathematicians is not completely unfamiliar; its something like what was needed
when Zentralblatt was started in 1931, or Mathematical Reviews in 1941. (As a curious
footnote, the founding editor of both these publications was Otto Neugebauer, who
worked just down the hall from me at the Institute for Advanced Study in the early
1980s, but who I had no idea was involved in anything other than decoding Babylonian
mathematics until I was doing research for this blog post.)
When it comes to actually constructing a system for encoding pure mathematics, theres
an interesting example: Theorema, started by Bruno Buchberger in 1995, and recently
updated to version 2. Theorema is written in the Wolfram Language, and provides both
a document-based environment for representing mathematical statements and proofs,
and actual computation capabilities for automated theorem proving and so on.
213
No doubt itll be an element of whats ultimately built. But the whole project is
necessarily quite largeperhaps the worlds first example of big math. So can the
project get done in the world today? A crucial part is that we now have the technical
capability to design the language and build the infrastructure thats needed. But beyond
that, the project also needs a strong commitment from the worlds mathematics
communityas well as lots of contributions from individual mathematicians from
every possible field. And realistically its not a project that can be justified on
commercial groundsso the likely $100+ million that it will need will have to come
from non-commercial sources.
214
But its a great and important projectthat promises to be pivotal for pure
mathematics. In almost every field there are golden ages when dramatic progress is
made. And more often than not, such golden ages are initiated by new methodology
and the arrival of new technology. And this is exactly what I think will happen in pure
mathematics. If we can mobilize the effort to curate known mathematics and build the
system to use and generate computational knowledge around it, then we will not only
succeed in preserving and spreading the great heritage of pure mathematics, but we
will also thrust pure mathematics into a period of dramatic growth.
Large projects like this rely on strong leadership. And I stand ready to do my part, and
to contribute the core technology that is needed. Now to move this forward, what it
takes is commitment from the worldwide mathematics community. We have the
opportunity to make the second decade of the 21st century really count in the multi-
millennium history of pure mathematics. Lets actually make it happen!
215
4. Consumers are on the Verge of Understanding Big Data: Are You? by Douglas
Rushkoff
The NSA scandal has put the possibilities of big data back into the public
imagination as forcefully as those
Douglas Rushkoff images of Tom Cruise waving his
Author, Throwing Rocks at hands to manipulate the interface in
the Google Bus Minority Report. Only now, instead
LinkedIn Contact of science fiction exaggerating
reality, the reality of our
technological capability still
Douglas Rushkoff is author of fifteen bestselling
books on media, technology, and culture including
outpaces most consumers' ability to
Program or Be Programmed, Present Shock, and, conceive it.
most recently, Throwing Rocks at the Google Bus.
He made the PBS Frontline documentaries Not for long.
Generation Like, Merchants of Cool, and The
Persuaders, wrote the graphic novels ADD and
Testament, and originated concepts from "viral For now, most regular people are
media" to "social currency." He's Professor of still concerned about surveillance on
Media Theory and Digital Economics at the actual things they are doing.
CUNY/Queens, and lectures around the world
about media, society, and change. He won the People don't want the government
Marshall McLuhan Award for his book Coercion, knowing what they've said to
and the Neil Postman Award for Career friends over the phone about how
Achievement in Public Intellectual Activity.
they take deductions on their tax
forms. They don't want Tiffany's
receipts for gifts to a mistress showing up in divorce court. And they don't want anyone
knowing about the time they streamed that fetish video because they happened to feel
curious, or particularly lonely one night.
So when both the NSA and corporations assure consumers that "no one is listening to
your conversations" and "no one is reading your email," at least we know our content is
supposedly private.
But as anyone working with big data knows, the content of our phone calls and emails
means nothing in comparison with the meta-data around it. The time you make a phone
call, the duration, the location from which you initiated it, the places you went while
216
you talked, and so on, all mean a whole lot more to the computers attempting to
understand who we are and what we are about to do next.
The more data points a statistician has about you, the more data points he has to
compare with all the other people out there. Hundreds of millions of people, each with
tens of thousands of data points. Nobody cares what any particular data point says
about you - only what they say about you in comparison with everyone else.
It's the very same process used by direct mail marketers since the days when they kept
index card files of everyone on their mailing list. They've always known who owns a
house or a car, who is having a baby, who has a loan, and other basic information from
which to draw conclusions about likely targets for particular mailings. But with the aid
of computers and, now, the data trails we all leave behind everywhere we go, that
process has become far more complex and predictive.
For example, it may turn out that people who have cars with two doors, own cats, open
a weather app between 10am and noon, and have gone on a travel site in the last fifteen
minutes are 70% likely to purchase a pair of skis in the next three months. Does anyone
217
know or care why that data set is true? No. But it's extremely valuable to the companies
selling skis who are trying to do an efficient ad spend.
The same sorts of data can be used to predict the probability of almost anything - from
whether a person is going to change political parties to whether a person is going to
change sexual orientation. It has nothing to do with they say in their emails about
politics or sex, and everything to do with the seemingly innocuous data that has
nothing to do with anything. Big data has been shown capable of predicting when a
person is about to get the flu based on their changes in messaging frequency, spelling
auto corrections, and movement tracked by GPS.
People outside the technology and marketing industries don't fully grasp this yet. They
know someone might be recording their calls, and they know that things they've
written about in their emails mysteriously show up in display ads on the website they
visit. But the conversation is beginning, and the truth is winding its way through public
consciousness.
As people come to understand how companies and government use the data they leave
behind, they will become less willing to participate in everything from loyalty programs
(whose sole purpose was to vacuum data) to turning over a zip code at the cash
register. They will come to understand that it's not the zip code or the panties they're
buying that they should be protecting themselves from, but the process through which
their futures are predicted and then actively encouraged by big data.
See, there's the rub. Big data doesn't tell us what a person might do. It tells us what a
person might likely do, based on past performance of other people. Big data is a very
complicated rearview mirror, through which analysts forecast likely outcomes.
Marketers then promote those outcomes by advertising diet products to people who
might be persuaded to go on a diet, or cookies to people who might be persuaded to
become obese.
What it doesn't take into account is novelty. New outcomes are never predicted by big
data. It can't do that, because it can't see anything that hasn't already happened. It can
see the likelihood of another shoe bomber, but a mall bomber? Hasn't happened yet, so
it can't be predicted to happen again.
218
As a result, companies depending on big data must necessarily reduce the spontaneity
of their customers. You need to make your consumers less lifelike and unique, in order
for them to conform to some pattern that's already happened that you want to exploit.
That's not a great relationship to have with your customers: hoping that they get less
interesting.
Worse, governments depending on big data for security are trapped in the past,
spending massive resources attending to last year's ideas, and becoming even more
vulnerable to next year's innovations.
The speed at which various constituencies figure all this out may be the biggest
predictor of our future. Even bigger than big data.
219
5. The Frightening Perils and Amazing Benefits of Big Data by Vivek Wadhwa
Over the centuries, we gathered data on things such as climate, demographics, and
business and government transactions. Our farmers kept track of the weather so that
they would know when to grow their crops; we had land records so that we could own
property; and we developed phone books so that we could find people. About 15 years
ago we started creating Web pages on the Internet. Interested parties started collecting
data about what news we read, where we shopped, what sites we surfed, what music
we listened to, what movies we watched, and where we traveled to. With the advent of
LinkedIn, MySpace, Facebook, Twitter and many other social-media tools, we began to
220
volunteer private information about our work history and social and business contacts
and what we likeour food, entertainment, even our sexual preferences and spiritual
values.
Today, data are accumulating at exponentially increasing rates. There are more than 100
hours of video uploaded to YouTube every minute, and even more video is being
collected worldwide through the surveillance cameras that you see everywhere.
Mobile-phone apps are keeping track of our every movement: everywhere we go; how
fast we move; what time we wake. Soon, devices that we wear or that are built into our
smartphones will monitor our bodys functioning; our sequenced DNA will reveal the
software recipe for our physical body.
The NSA has been mining our phone metadata and occasionally listening in; marketers
are correlating information about our gender, age, education, location, and
socioeconomic status and using this to sell more to us; and politicians are fine-tuning
their campaigns.
This is baby stuff compared to what lies ahead. The available tools for analyzing data
are still crude; there are very few good data scientists; and companies such as Google
still havent figured out what is the best data to analyze. This will surely change rapidly
as artificial-intelligence technologies evolve and computers become more powerful and
connected. We will be able to analyze all data we have collected from the beginning of
timeas if we were entering a data time machine.
221
We will be revisiting crime cases from the past, re-auditing tax returns, tracking down
corruption, and learning who were the real heroes and villains. An artificially intelligent
cybercop scanning all the camera data that were gathered, as well as phone records, e-
mails, bank-account and credit-card data, and medical data on everyone in a city or a
country, will instantly solve a crime better than Sherlock Holmes could. Our
grandchildren will know of the sins we committed; Junior may wonder why grandpa
was unfaithful to grandma.
What is scary is that we will lose our privacy, opening the door to new types of crime
and fraud. Governments and employers will gain more control over us, and have
corporations reap greater profits from the information that we innocently handed over
to them. More data and more computing will mean more money and power. Look at
the advantage that bankers on Wall Street have already gained with high-frequency
trading and how they are skimming billions of dollars from our financial system.
We surely need stronger laws and technology protections. And we need to be aware of
the perils. We must also realize that with our misdeeds, there will be nowhere to hide
not even in our past.
222
Consider what becomes possible if we correlate information about a persons genome,
lifestyle habits, and location with their medical history and the medications they take.
We could understand the true effectiveness of drugs and their side effects. This would
change the way drugs are tested and prescribed. And then, when genome data become
available for hundreds of millions of people, we could discover the links between
disease and DNA to prescribe personalized medicationstailored to an individuals
DNA. We are talking about a revolution in health and medicine.
In schools, classes are usually so large that the teacher does not get to know the student
particularly the childs other classes, habits, and development through the years.
What if a digital tutor could keep track of a childs progress and learn his or her likes
and dislikes, teaching-style preferences, and intellectual strengths and weaknesses?
Using data gathered by digital learning devices, test scores, attendance, and habits, the
teacher could be informed of which students to focus on, what to emphasize, and how
best to teach an individual child. This could change the education system itself.
Combine the data that are available on a persons shopping habits with knowledge of
their social preferences, health, and location. We could have shopping assistants and
personal designers creating new products including clothing that are 3D-printed or
custom-manufactured for the individual. An artificial intelligence based digital assistant
could anticipate what a person wants to wear or to eat and have it ready for them.
All of these scenarios will become possible, as will thousands of other applications of
data in agriculture, manufacturing, transportation, and other fields. The only question
is how fast will we get thereand what new nightmares will we create.
It is a very hard thing to convince a professional that they actually know nothing about
a skill they pride themselves on having. Yet study after study has shown Law Firms re-
hiring Barristers who are proven losers. Patients select their doctors from online
reviews, not procedure outcomes, unions fight tooth and nail to prevent teachers being
measured. George Bernard Shaw once famously quipped All professions are
conspiracies against the laity. In area after area there are huge opportunities for
perception/reality arbitrage. It turns out that this is something Machine Intelligence is
really good at.
Data is here all around us. Courts collect legal data, hospitals, insurers and exam
boards, the same. The issue is that it exists in disparate and unusable forms. Once
enough of it has been collected though, big data can tease surprising analysis from the
224
jumble. The lawyer on the billboard hasn't won a case in years, theres a vein doctor
thats surprisingly good at plastic surgery procedures, a nutritionist with remarkably
few cancer patients, an overlooked teacher in an inner city school is turning around kids
lives. We can reap these benefits when we admit our failings and let the machines take
over in these areas. Because transparency is a good thing. Transparency works. Big data
can spot abusive Homeowners Associations by correlating property records with
litigation data, the beginnings of Ponzi schemes by comparing the same with public
financial filings, fine-tune education by comparing textbook choices with exam results.
Big data is here and its just getting started.
We need machines that can think, because there are some areas where we just cant.
This brings efficiencies, aligns incentives, raises standards of living. When the things we
fail at become bot work, it frees humanity to do the things we, as a species can be proud
of. We can focus on innovation, creativity, the quest to explore and make things better.
And well do it standing on the shoulders of machines.
225
6. Is Little Data the Next Big Data? by Jonah Berger
LinkedIn Contact
Three weeks ago, at approximately
Jonah Berger is a marketing professor at the 10:32pm, Nike made me hit my dog
Wharton School at the University of Pennsylvania
in the face. Now before the ASPCA
and author of the recent New York Times and Wall
Street Journal Bestseller Contagious: Why Things starts calling, it was a complete
Catch On. Dr. Berger has spent over 15 years mistake (she snuck up behind me
studying how social influence works and how it
drives products and ideas to catch on. Hes while I was furiously swinging my
published dozens of articles in top-tier academic arm in circles). But the fact that it
journals, consulted for a variety of Fortune 500
companies, and popular outlets like the New York
happened at all sheds light on how
Times and Harvard Business Review often cover his the recent quest for personal
work.
quantification has changed our
lives, both for better, and worse.
The era of Big Data is upon us. From Target mining shopper data to figure out who is
getting pregnant to Google using online search to predict incidence of the flu,
companies and organizations are using troves of information to spot trends, combat
crime, and prevent disease. Online and offline actions are being tracked, aggregated,
and analyzed at dizzying rates.
The power of Big Data was on display, once again, at this weeks TechCrunch Disrupt
conference, a biannual confab of some of the biggest players in Silicon Valley. Big data
believer and Salesforce.com CEO Marc Benioff was there, as were plenty of venture
capitalists betting on a future in businesses driven by data.
But while Big Data gets all the attention, innovative technologies have also enabled
Little Data to flourish. Personal quantification. The measurement, tracking, and
analysis of the minutiae of our everyday lives. How many calories we consumed for
226
breakfast, how many we burned on our last run, and how long we spend using various
applications on our computer.
In some ways Little Data is a boon. We can lose weight by realizing we tend to splurge
on Thursdays. We can be more efficient at work by realizing we dilly-dally more than
we thought on Facebook. With data firmly in hand we work to optimize every aspect of
our behavior.
But this measurement also has some insidious aspects that we often ignore. We forget
that what we track determines where we focus and what we are motivated to improve.
Why do people obsess over LinkedIn Connections or Twitter followers? SAT scores,
golf handicaps, or even gas mileage? Because they are observable metrics that are easy
to
compare. Someone who has more LinkedIn connections must have more expertise.
Someone with more Twitter followers must be more influential. So people use these
metrics as a yardstick. An easy way to assess whether they are doing well.
But just because a metric is easy to capture doesnt mean its the right metric to
use. More followers dont actually equal more influence. More connections dont
227
necessarily mean more expertise. They may just mean someone spend a lot of time on
the site.
Its like the old adage about the drunk searching for his keys. One night a policeman
sees a drunk scouring the ground around a streetlight so he asks the drunk what he is
looking for. The drunk says I lost my keys, and the policeman, wanting to be helpful,
joins in the search. After a few fruitless minutes combing the area, the policeman asks
the drunk are you sure you dropped them here?" Not sure, the drunk says, I have
no idea where I dropped them. Then why are we searching under the street light?
asks the policeman. Because that is where the light is, the drunk replies.
And this brings us back to Nike, my dog, and swinging my arm around in circles like a
windmill. Nikes new FuelBand helps people track how much exercise they get on a
daily basis. It records how many steps you take, calories you burn, and even Fuel, a
measure of overall exertion. All on a handy tracker you wear around your wrist. You
set a daily Fuel goal, and if you get there, the wristband gives you a mini-celebration.
The device is meant to encourage exercise, but like many examples of Little Data, it
focuses attention on a particular aspect of behavior. Shining a light and determining
where people devote their effort.
As a result, Fuel becomes the ends rather than the means. I was brushing my teeth,
about to go to bed, when I noticed that I had 3,900 Fuel points. My goal for the day was
4,000. I was so close! I couldnt go to bed without reaching my goal.
But it was 10:28 pm. Too late for a run or any exercise really. So I started doing a couple
of half-hearted jumping jacks. Then I realized that arm movement burned the most
Fuel. So I started swinging my arm around in huge circles. Just when the dog decided
to walk up and take a closer look at what in the world I was doing with my arm.
Is Nikes goal to get people to swing their arms in circles? Unlikely. But by tracking a
measure that values such behavior, that is what it encourages. Searching under the
streetlight.
228
Measurement is great. Without it we dont know where we are, how were doing, or
how to improve. But we need to be careful about how we use it. Because without
realizing it, measurement determines rewards and motivation. It determines what
people care about, what they work to achieve, and whether they cheat to get there.
Tracking student test scores helps measure achievement, but it also encourages teachers
to teach to the test.
So before you obsess over a particular metric, make sure its the right metric to obsess
over. It might just be time to find a new streetlight.
Has Little Data and the quest for quantification gone too far? Is Big Data a boon or a bust?
229
7. Future of Data Analytics and Beyond... by Deeksha Joshi
1. Analytics as an App
This is one of the most favorite of them all. As analytics is going prime time, it is also
getting a great deal of pressure to make it available in ready to use form. This will
eventually give rise to analytics delivered as an app. Think of an app to crunch your
garbled data and find the hidden relationships or think of an app that takes your vitals
and compares it with world data (benchmarking) and suggests implications and
recommendations. Appification of analytics is still not in its prime but it is a future that
is not avoidable and bound to happen for the betterment of mankind.
230
from raw bit of dumb big data to crunched smart data. It is not very far when the
pressure of deploying such use cases will push analytics to be an embedded
phenomenon living in the last mile of the analog world.
If anyone is following the big data evolution, its pretty much following the trajectory of
IT revolution of late 90s and early 2000s. Analytics will also grow like IT professionals,
its use will become modularized and some parts of analytics will evolve as an overhead
that could be outsourced.
231
analytics. This will reduce the cost of experimentation and increases the analytics
quality outcome, which makes it almost impossible for businesses to not adopt this
strategy in the future.
7. Machine and Deep Learning finds its way into prime time analytics
Currently machine learning and deep learning are a tool for the learned and experts but
as we wander around the areas of non-hypothesis based learning, the machine/deep
learning tools and capabilities will hit prime time to help everyday users make use of
these capabilities to find predictive insights and decipher patterns.
232
tools is unavoidable. Automation of BI and Analytics capabilities will be forced by the
talent gap and help reduce the risks of talent gap.
There are some risks of increasing automation and analytics that call for a mention here,
notably the increasing security related impacts that have a potential of disrupting some
of the progress made. On the upbeat note as the world is unleashing its latest tools and
capabilities for analytics, the world will be nothing but a better and data driven place
for all of us to live and decipher, one bit at a time.
234
in row format and pulled them into memory for the cpu to work on. There were various
ideas about how to do that but then E.F. Codd, Ted to Michael Sotnebraker, came up
with Relational Calculus and showed that you could abstract the query/analyze/update
cycle. Michael Stonebraker wrote Ingres, and Postgres to implement Ted's ideas. At first
it was doubted that these ideas would perform well, especially against the hand crafted
data access methods like ISAM, and B-Trees that were used at the time. But that was a
bet against a compiler, and betting against the compiler is a losing one. Oh and by the
way, Michael thinks that the NoSQL people are also betting against the compiler, and
will lose for the same reasons.
But then data got bigger and CPUs got faster and now the CPU is starved for data and
the row at a time model doesn't make sense. So he moved on to columnar databases
writing c-store which became Vertica and is owned by HP. He didn't invent the
columnar database, that honor goes to to MonetDB, formerly Monet, a research db that
is still quite good and which I recommend if you are on a budget or need to do research
in databases. Now you can cram a whole column of numbers into memory, and the
CPU can add them up lickety split and give you an answer without having to pull in all
the other bit of information in the rows. So with all this goodness where is the problem?
There are several problems, or maybe its one problem with several faces. First there is a
language problem, or at least a human understanding problem that needs to be
addressed in SQL. For instance people like tables, in particular they like spreadsheets
and they like to have all sorts of complicated equations between various parts of the
spreadsheet and SQL addresses "simple" relationships within a table, but many people
in general, and data scientists in particular, like complicated relationships between
elements - like a difference of Gaussian analysis conducted between different elements
of the pixel table "spreadsheet" . As researchers we might be interested in feature
extraction on a picture, where the intersection of the rows and columns are pixels. We
might then take the entire frame (table) and make it fuzzier by using a Gaussian
function between neighboring cells, and then do it again to create a third frame, and so
on, stacking one behind the other. Then we want to analyze how these stacked frames
are related. The process is called a Difference of Gaussians. What is the SQL query for
that?
235
So data scientists use a matrix called a data-frame to work their magic. A data-frame is a
lot like a spreadsheet except that it can be pushed to much higher dimensionality with
relationships defined between frames ("spreadsheets" in simple terms). While a
spreadsheet is a two dimensional object, a data-frame is an n-dimensional object. In it
you can specify, or ascertain, the relationships between spreadsheets, a three
dimensional analysis, or between workbooks, a four dimensional analysis, or a OLAP
"cube" etc.. Before you go accusing me of creating a corner case of no use to business,
using a Difference of Gaussian analysis is a great way to identify feature vectors that
allow us to do object recognition in photos, or do hand writer recognition in on checks
to detect fraud and forgery, or to do optical character recognition converting paper
records etc. But SQL is not designed to handle these more sophisticated relationships.
In other words, there is a failure of language. Here is Michael Stonebaker in his own
words:
Historically analysis tools have focused on business users, and have provided easy-to-
use interfaces for submitting SQL aggregates to data warehouses. Such business
intelligence (BI) tools are not useful to scientists, who universally want much more
complex analyses, whether it be outlier detection, curve fitting, analysis of variance
[emphasis added] , predictive models or network analysis. Such complex analytics is
defined on arrays in linear algebra, and requires a new generation of client-side tools
and server side tools in DBMSs.
We could, and of course we do, write specialized queries like map reduce, and
recursive for loops to get things done, but wouldn't it be much nicer to be able to write
something like:
In other words wouldn't it be nice if the language took care of all that nasty writing of
for loops etc. and took what we write in a high level language and sort it all out
underneath? This is what R, and Pandas do. They allow you to create a vector function
like DifferenceOfGaussian and apply it to a data-frame. In a Pandas data-frame you can
define your own function "f(x)" that is applied in a vector fashion like so:
236
def f(x):return x[0]+ x[1]
df = pandas.read_csv("myspreadsheet.csv")
Of course "f(x)" can be much more complicated than in the example above. This
application of a function can be done because a data-frame is a matrix, and it allows
more intricate patterns among data-frame elements and therefore unlocks the business
data opportunities residing in our data repositories. And as a bonus we don't have to
keep going to the SQL standards meeting to argue that our particular function needs to
be included in the language. A process which slows the pace of innovation.
Michael Stonbraker's SciDB supports three languages AQL/AFL (array SQL), R and
Python. ScidDB is Open source but closely controlled. But if you can't get your hands
on this there are other efforts that come close. Here are two projects that add data-
frames to columnar DBs, Hadoop, SQL and NoSQL databases. Recently Apache
announced the addition of data-frames to Spark which supports Scala, Java, and
Python. and Blaze which supports data frames and is, some what, compatible with
Python Pandas. Does anyone see a pattern here? Python is the common thread.
Hmm....
The next big obstacle is the sheer volume of data, and SciDB aim to address in a similar
fashion to Spark and Blaze, by chopping up a data-frame into chunks for distributed
processing and thus allow manipulations to be bigger than any one computer's main
memory. And here the products, for now, are different. Not for long, I believe, because
any good idea will be put into the compiler and query optimizer and soon all these
product will converge. For instance you can chunk a data-frame into separate pieces for
some cases that are fairly independent or chunk them so that they overlap for others.
When we are smoothing data between pixels we will want the overlapping type of
chunking, but when we just want the sum we will want non-overlapping. There are
already hybrid chunkers that optimize by type (i.e. overlapping or not), much like
current SQL compilers can decide whether a B-Tree or ISAM storage Index will result in
faster execution times. In this future world we data scientists will be free from writing
long torturous Map-Reduce functions to keep memory requirements within bounds,
and we will leave that to the compiler. Whew, won't that be nice.
237
Well I hope you keep an eye on this space because it is going to get interesting.
238
9. A Path Forward for Big Data by Jules Polonetsky
239
Today, the Future of Privacy Forum is
releasing two papers that we hope
will help frame the big data
conversation moving forward and
promote better understanding of how
big data can shape our lives. These
papers provide a practical guide for
how benefits can be assessed in the
future, but they also show how data
is already is being used in the
present. FPF Co-Chairman
Christopher Wolf will discuss key
points from these papers at the
Federal Trade Commission public
workshop entitled Big Data: A Tool for Inclusion or Exclusion? in Washington on
Monday, September 15.
We are also releasing a White Paper which is based on comments that will be presented
at the FTC Workshop by Peter Swire, Nancy J. & Lawrence P. Huang Professor of Law
and Ethics, Georgia Institute of Technology. Swire, also Senior Fellow at FPF, draws
lessons from fair lending law that are relevant for online marketing related to protected
classes.
The world of big data is messy and challenging. The very term big data means
different things within different contexts. Any successful approach to the challenge of
big data must recognize that data can be used in a variety of different ways. Some of
these uses are clearly beneficial, some of them clearly are problematic, some are for uses
that some believe beneficial and others believe to be harmful. Some uses have no real
impact on individuals at all. We hope these documents can offer new ways to look at
big data in order to ensure that it is only being used for good.
240
Big Data: A Benefit and Risk Analysis
Privacy professionals have become experts at evaluating risk, but moving forward with
big data will require rigorous analysis of project benefits to go along with traditional
privacy risk assessments. We believe companies or researchers need tools that can help
evaluate the cases for the benefits of significant new data uses. Big Data: A Benefit and
Risk Analysis is intended to help companies assess the raw value of new uses of big
data. Particularly as data projects involve the use of health information or location data,
more detailed benefit analyses that clearly identify the beneficiaries of a data project, its
size and scope, and that take into account the probability of success and evolving
community standards are needed. We hope this guide will be a helpful tool to ensure
that projects go through a process of careful consideration.
Identifying both benefits and risks is a concept grounded in existing law. For example,
the Federal Trade Commission weighs the benefits to consumers when evaluating
whether business practices are unfair or not. Similarly, the European Article 29 Data
Protection Working Party has applied a balancing test to evaluate legitimacy of data
processing under the European Data Protection Directive. Big data promises to be a
challenging balancing act.
Even as big data uses are examined for evidence of facilitating unfair and unlawful
discrimination, data can help to fight discrimination. It is already being used in myriad
ways to protect and to empower vulnerable groups in society. In partnership with the
Anti-Defamation League, FPF prepared a report that looked at how businesses,
governments, and civil society organizations are leveraging data to provide access to
job markets, to uncover discriminatory practices, and to develop new tools to improve
education and provide public assistance. Big Data: A Tool for Fighting Discrimination
and Empowering Groups explains that although big data can introduce hidden biases
into information, it can also help dispel existing biases that impair access to good jobs,
good education, and opportunity.
Lessons from Fair Lending Law for Fair Marketing and Big Data
Where discrimination presents a real threat, big data need not necessary lead us to a
new frontier. Existing laws, including the Equal Credit Opportunity Act and other fair
241
lending laws, provide a number of protections that are relevant when big data is used
for online marketing related to lending, housing, and employment. In comments to be
presented at the FTC public workshop, Professor Peter Swire will discusshis work in
progress entitled Lessons from Fair Lending Law for Fair Marketing and Big Data.
Swire explains that fair lending laws already provide guidance as to how to approach
discrimination that allegedly has an illegitimate, disparate impact on protected classes.
Data actually plays an important role in being able to assess whether a disparate impact
exists! Once a disparate impact is shown, the burden shifts to creditors to show their
actions have a legitimate business need and that no less reasonable alternative exists.
Fair lending enforcement has encouraged the development of rigorous compliance
mechanisms, self-testing procedures, and a range of proactive measures by creditors.
***
There is no question that big data will require hard choices, but there are plenty of
avenues for obtaining the benefits of big data while avoiding or minimizing any
risks. We hope the following documents can help shift the conversation to a more
nuanced and balanced analysis of the challenges at hand.
242
10. Democratization of Data Analytics: Why, Who, and What by Kirk Borne
Enterprise analytics is most successful when it doesnt emulate enterprise IT, which
usually maintains centralized control and governance of IT assets. Data assets are now
becoming the fundamental fuel of business, for competitive intelligence, data-driven
243
decisions, innovation, and improved outcomes. The data assets and analytics strategy
should be governed by the C-suite (Chief Data Officer and/or Chief Analytics Officer),
but the analytics activities must spread and be shared across all business groups:
production, services, marketing, sales, finance, risk management, IT, HR, and asset
management. Decentralizing the data analytics activities thus relieves the enterprise IT
department from the burden of learning all of the new data sciencey things (which
detracts from their primary IT activities).
Can a new tool placed in untrained hands lead to problems? Yes, that is possible,
perhaps even likely. At best, it leads to results that are not understandable,
interpretable, or meaningful; and at worst, it leads to totally incorrect results (for
example, overfitting a model can give the false impression of high accuracy when in fact
it is precisely the opposite; or not knowing that certain analytics methods require
particular data types, or that some data transformations can lead to useless results and
wasted effort). An employee without a formal analytics education but with good data
literacy, numeracy, curiosity, and problem-solving skills will probably be okay, but
only after some mentoring, tutorials, and training. Overall, our posture should be one
of cautious skepticism -- "out-of-the-box" analytics solutions will probably work in most
situations where the end-user is gaining on-the-job experience and training in analytics
best practices.
It is important to remember that analytics (data science) functions are not IT functions.
Enterprise analytics tools should not impose significant new burdens on traditional IT
departments, other than the usual software installation and licensing activities.
Analytics is a front-office top-of-the-organization client-facing activity in most
organizations. We don't ask the IT department to handle Human Resources functions,
even though HR uses IT resources. We don't ask the IT department to handle company
finances and investments, even though the CFO and finance folks use IT resources.
Similarly, the IT department should not be involved in analytics functions, other than
enabling and maintaining the systems, networks, and environments that are being used.
(Note that this is not meant to say that individual IT staff members should not become
part of the analytics revolution and participate in the data activities in the organization,
but that function is not part of the IT departments charter.)
244
What does democratization of data analytics mean for B2B vendors? For software
vendors (or any vendor), it is all about the value proposition -- what do they have to
offer that some other vendor doesn't have? The old marketing joke says: "Faster, Better,
Cheaper -- choose any two!" Vendors can sell you the fastest product, or the best
product, or the cheapest product, or (preferably) the "Goldilocks" product ==> that's the
one that is "just right" for you: the one with the right value, the right ROI, the right
number of features,... So, the job of vendors (specifically marketing and sales people)
becomes harder, since their value proposition may change from client to client, even
though their product is not changing. Some vendors will emphasize open source,
others will focus on APIs, others will tell you about "Analytics-as-a-Service", or tell you
about "Data Scientist in a Box". Some of those pitches will fall on deaf ears, and some
will resonate strongly, depending on the enterprise that they are proposing to. "Know
thy customer" will become even more critical, which (by the way) is a problem that can
itself be addressed through data analytics within the vendor's own sales, marketing,
business intelligence, and customer relationship management systems!
Accompanying the data revolution and the digital revolution is the API revolution,
including microservices and containerization, which represent an architectural style in
which large complex applications are broken down (modularized) into a collection of
independent, loosely coupled services. Those services are then bundled and shipped to
the compute engine (e.g., a cloud analytics platform) in containers, such as the hot
new Docker microservices framework. This trend toward more "as-a-service" and API
offerings will only get stronger, especially with the advent of the Internet of Things
(IOT) -- everyone will want to get into that action. The IOT will achieve greatest value,
effectiveness, and efficiency when some of the data analytics functions (e.g., anomaly
alerting, novelty discovery, trend detection) are moved to the edge of the network and
embedded into the sensor. Similarly, the new movement toward greater data
productization doesnt end at the production line, but when critical analytics and
discovery functions are embedded within the products and processes themselves.
245
applications away from advanced "specialist-in-the-loop" tools and moving toward
enterprise easy-to-use "anyperson-in-the-loop" tools. The business analytics value
proposition in the big data era is thus moving toward its manifest destiny: enterprises
will distribute, empower, and automate high-level analytics capabilities across the
whole organization to persons, processes, and products.
246
SECTION D: CASE STUDIES & USE CASES
One of the major problems in data science is to understand where to apply it. Many
leaders struggle to understand the possibilities and areas where data analytics can
provide the most value. The following case studies are meant to shine a light on some of
those applications that are enlightened by the art of data analytics. Many influencers
and leaders have shared interesting use cases on where they are using data analytics
and how they are getting value from it. The contributions are designed to showcase
their interesting work.
The best way to approach this section is to read each article in isolation. Before reading
each article, we suggest that you first read the authors biography. Knowing the
authors background can give you the right context to understand their perspective.
247
1. Artificial Intelligence Will Make Smart Lawyers Smarter (and dumb ones
redundant) by Maximiliano Marzetti
Of all professions the legal one is perhaps the most adverse to technological change. Word
processors replacing typewriters was the last big innovation. Artificial Intelligence (AI)
in legal practice is almost unheard of.
Richard Susskind is someone specialized in forecasting the future of the legal profession.
He is certainly not a prophet but lawyer who dared to think outside the box. He has
authored a series of books on the subject: The future of law [1]; Transforming the law [2];
The end of lawyers [3], Tomorrows lawyers [4], etc.
Mr. Susskind warns the provision of legal services will change radically in the next
decades. He identifies three drivers of change:
a) A more for less attitude from clients (specially after the latest global financial crisis);
b) The liberalisation of the legal profession (lawyers will be soon competing against non-
lawyers for legal work) and;
248
c) technology (e.g. IT, AI, cloud computing, big data, etc.).
In the meantime, in the early 21st century, lawyers still provide legal services as if they
were in the 20th, if not the 19th century.
AI has a similar potential and is to be feared only by those unwilling to embrace progress.
AI will free humanity from repetitive tedious tasks, it will allow us to devote the little
time we are granted on this planet Earth to do what makes us truly human: to create, to
innovate, to help each other.
Radical innovations bring about tremendous social change. New technologies create
winners and losers. Those threatened by a new technology will not only unwelcome it,
but will try to block its introduction or dissemination.
Some will try to slow down the pace of change by erecting legal barriers. If history can
tell us something is that no law has ever managed to deter the tsunami wave of creative
destruction. At best legal barriers can serve to slow down change, for a while. The
Ottoman Empire delayed for more than two centuries the introduction of the printing
press, and then tried to control it.
In the end innovation triumphs and the societies that resisted change for longer time
suffer the most. The Ottoman Empire collapsed after the end of WWI.
AI will not be the doom of lawyers. Good lawyers will remain; they will adapt, evolve
and provide services with more added value. Mediocre and uncreative ones may soon
249
find themselves jobless.
Richard Susskind suggest new legal professions will be needed. He explicitly mentions
the legal risk analyst, the legal knowledge engineer, the contract architect, the legal
project manager and legal hybrids (professionals combining legal skills with economic,
scientific or statistical knowledge).
Future lawyers will use technology to break free from repetitive, time-consuming, low-
quality work. For instance, lawyers devote most of their time manually searching for
information or drafting similar documents. These tasks will soon be taken over by AI.
AI can become a legal realist's dream. Coupled with big data it can help predict the
outcome of future judicial decisions by analyzing previous decisions of a given judge or
Court.
AI can also turn scattered data into meaningful and useful information. Statistics,
predictive analytics and risk management will replace educated guesses. Lawyers will
still be there to give advice, put big data in perspective and design efficient legal
strategies.
AI will not only empower lawyers but also their clients. Selecting a lawyer to represent
one's interest will cease to be an act of faith. Economist Philip Nelson called professional
services credence goods, because the client cannot assess the quality of the service neither
before nor after it has been rendered.[6] AI and data analytics already allows a
prospective client to choose a trial lawyer according to her win rate. Selecting a lawyer
should be based in factual performance not on biased perception.
AI will certainly take over routine tasks ordinarily carried out by paralegals, cheaper off-
shore legal counsels or uncreative in-shore expensive ones. Complex legal research,
litigation data intelligence, counsel selection, discovery and even basic contract drafting
would be taken over by AI.
The future lawyer will have to develop a set of skills that would make her irreplaceable
by AI: leadership, creativity, empathy, relationship building, conflict resolution, etc. The
bottom line: opportunities for entrepreneurial and creative lawyers are endless.
250
4. Conclusion
To sum up, both supply and demand forces will push together to incorporate AI to the
lawyers toolkit. AI will help a lawyer to deliver better, more precise and tailored legal
solutions to her clients. Those lawyers that reject AI may soon be out-of-market, replaced
by the very same technology they rejected.
Antiquated lawyers gone, creative lawyers empowered, new legal careers better suited
to cater for client's needs, more information and competitiveness in the legal market
the future of the legal profession with AI looks promising to me!
Notes
1. Susskind, R.E., The future of law: facing the challenges of information technology. 1996,
Oxford University Press.
2. Susskind, R.E., Transforming the law: essays on technology, justice, and the legal
marketplace. 2001, Oxford University Press.
3. Susskind, R.E., The end of lawyers? Rethinking the nature of legal services. 2008,
Oxford University Press
251
2. Cyber security Solutions Based on Big Data by Steven King
252
from general rules e.g., If A = B and B = C, then A = C, regardless of what A or B contain.
Deductive reasoning tracks from a general rule to a specific conclusion. If original
assertions are true then the conclusion must be true. A fundamental weakness of
deductive reasoning is its often Tautological (e.g. Malware contains malicious code and
is always true) and it is unaffected by contextual inputs, e.g., to earn a masters degree,
a student must have 32 credits. Tim has 40 credits, so Tim will earn a masters degree,
except when he decides not to.
In security analytics, A only equals B most of the time and sometimes it can equal D, so
A cannot always equal C, therefore using deductive reasoning as a basis for detection
analytics is a flawed way to try and predict the future. You are theoretically guaranteed
to be wrong at least once.
In general, common signature-based systems such as IDS/IPS and endpoint security are
deductive in nature.
Inductive Reasoning
Where analytics engines are based on inductive reasoning, the resulting analytics
resemble probability theory. Even if all of the premises are true in a statement,
inductive reasoning allows for the conclusion to be false. Heres an example: "Harold is
a grandfather. Harold is bald. Therefore, all grandfathers are bald." The conclusion does
not follow logically from the statements.
This is a better approach than deductive reasoning for projecting the future, but it is
obviously imperfect and can produce even more widely varying results.
253
Inductive reasoning heuristics are frequently used by contemporary IDS/ IPS systems to
generalize the probability of malicious behaviors based on limited input (e.g., known
signatures). This also works a high percentage of the time. But not always.
So, a deductive argument claims that if its premises are true, its conclusion must be true
- absolutely. An inductive argument claims that if its premises are true, its conclusion is
probably true - probably.
Bayesian Probability
In most Bayesian based security analytics, when a result is 3 standard deviations from
normal, the system declares it an anomaly. The goal of Bayesian Reasoning is to be
able to identify a normal pattern of behavior by observing subtle fluctuations in
activity within the enterprise infrastructure over a period of time to establish a corpus
of prior events. The result is a baseline which is used as a subsequent benchmark
against which all network activity and/or behaviors will be measured in the future.
Unfortunately, this baselining is flawed and can lead to extraordinary outcomes none of
which will result in properly identified threats. There are three significant problems
with this approach:
If the network and/or the systems being baselined are already infected before the
baseline is created then the baseline establishes a false premise,
If an insider is already active on a network, the that insiders actions will appear as
nominal and become part of the normal baseline, and
254
Todays network infrastructure and user behavior is increasingly dynamic, variable and
diverse involving many different devices and protocols, access methods and entry
points essentially making a baseline assessment impossible without a network
lockdown.
Analytics engines that use baselining as their premise for Bayesian Reasoning are prone
to extreme volumes of false positives, are cumbersome and difficult to tune and
administer, require lots of human attention and frequently miss malicious invasions. In
short, they dont work very well.
Abductive Reasoning
While inductive reasoning requires that the evidence that might shed light on the
subject be fairly complete, whether positive or negative, abductive reasoning is
characterized by an incomplete set of observations, either in the evidence, or in the
explanation, or both, yet leading to the likeliest possible conclusion.
255
A patient may be unconscious or fail to report every symptom, for example, resulting in
incomplete evidence, or a doctor may arrive at a diagnosis that fails to explain several of
the symptoms. Still, he must reach the best diagnosis he can. Probabilistic abductive
reasoning is a form of abductive validation, and is used extensively and very
successfully in areas where conclusions about possible hypotheses need to be derived,
such as for making diagnoses from medical tests, working through the judicial process
or predicting the presence of malware.
Given that Bayesian analytics begins with an impossible premise (at least 30 days of
baselining), lets throw this approach out of our analysis, so we can concentrate on what
might actually work.
So, a valid deductive argument is one that logically guarantees the truth of its
conclusion, if the premises that are presented are true. This is the form of logic that is
traditionally taught in mathematics courses and manifested in logic proofs:
A is B.
A is, deductively, C.
This form of logic is one that is self-contained, and any argument that uses deduction is
one that cannot offer any new findings in the conclusionsthe findings are presented in
the premises that hold the argument to begin with. That is, A, B, and C all exist (only) in
the premises that were presented. In security, this hardly ever happens anymore, so
deductive reasoning is not very useful.
An inductive argument is one that offers sound evidence that something might be true,
based on structured experience. This is the form of logic traditionally associated with
scientific inquiry:
256
Subsequent experiences may prove this wrong, and thus an inductive argument is one
where the premises do not guarantee the truth of their conclusions. Like deduction,
induction cannot offer any "new findings" contained within the logic of the argument.
I've done something like A before, but the circumstances weren't exactly the same.
I've seen something like B before, but the circumstances weren't exactly the same.
Unlike deduction or induction, abductive logic allows for the creation of new
knowledge and insightC is introduced as a best guess for why B is occurring, yet C is
not part of the original set of premises. In the context of data security, we are actually
looking for C and should be excited when we find it.
Since we are now dealing with extraordinarily smart cyber-criminals who are going to
great lengths to disguise their attacks, we need a much looser, yet more intelligent,
discovery-enabled form of analytics to uncover the patterns and force us to look for
corroborating evidence that our hypothesis is correct.
Soaking Wet
An example from everyday life that mimics the behaviors of todays malware might be
a man walking into a restaurant soaking wet. Based on that single observation, we may
reasonably conclude that it is raining outside. But, in order to be certain of that, we need
to check outside and see if in fact it is raining. And we discover that indeed it is.
However, that fact alone may not have resulted in the soaking wet man. He may also
have fallen into a lake or into the gutter, so we look to see if there is a lake nearby or a
gutter. And we discover that neither of those exist, but there is a fountain in front of the
restaurant. So, we have effectively reduced multiple Cs down to one probable C.
Now we need to examine the soaking wet man and the fountain more closely, etc.
Abduction acts as inference or intuition, and is directly aided and assisted by contextual
data. The abduction itself can be driven by any contextual patterns that act as an
257
argument from best explanation. So, we welcome a false conclusion even when based
on true premises, because it is the only way we can sort through all of the
masquerading paths. Better to be assaulted with a handful of false negatives than
millions of false positives.
There is only one engine on the security market right now that uses abductive reasoning
as the basis for their predictive analytics. By combining their reasoning methods with
contextual data from both inside the threat landscape and from external sources, they
have proven to do a much better job of detecting todays signature-free advanced
persistent threats than any of the other approaches have so far.
Data is growing faster than ever before and by the year 2020,
about 1.7 megabytes of new information will be created every
second for every human being on the planet.
258
3. Personalization of Big Data Analytics: Personal Genome Sequencing by Peter
B. Nicol
"The greatest single generator of
Peter B. Nichol wealth over the last 40 years, in the
CIO | Healthcare Business & digital revolution, has been a
Technology Executive transition from ABC's to 1's and
LinkedIn Contact 0's," says author Juan Enrquez.
What will be the single biggest
driver of change tomorrow?
Peter Nichol is a healthcare business and
technology executive recognized for digital
Enrquez believes the single biggest
innovation by CIO100, Computerworld, MIT Sloan driver of change, of growth, of
and PMI. Peter has an impressive background
industries in the future is the
leading information technology organizations,
working with business leaders defining strategic transition to writing in life code.
visions, building innovative digital capabilities The field known as bioinformatics
through the modernization of existing products and
services, and leading digital transformational change will be the technological
initiatives across the enterprise. transformation for big data. Today,
Peters specialties include digitizing complex technology is merely playing on the
platforms, leading large scale program fringe of its real potential.
management, driving application of emerging
technologies to enhance business systems, What if we could program a cell to
improving and innovating new consumer
experiences, driving better patient outcomes, and make stuff we want? To store data,
leading implementation of software products and we want stored.
setting up, organizing and managing large teams to
provide outstanding results. Big data analytics when applied to
Peter is a healthcare expert at PA Consulting Group. personal genome sequencing, the
Prior to joining PA, he was a CIO and has held combination of digital code and life
leadership positions in various healthcare
organizations.
code, represents the greatest social
driver of change since the invention
of the computer; the programmable
cell (Enriquez, 2007).
Behavior analytics, biometrics, distributive processing, Hadoop, Hana, HBase, and Hive
have all risen in frequency in big data discussions. Big data is a term describing the
259
large volume of data both structured and unstructured. It also implies that
conventional databases cant handle the processing and analytics to ensure the data is
retrievable and usable information recorded on a persistent medium. In short this data
is recorded, stored, processed, and used to make better decisions. Due to the magnitude
of data standard processing, often is ineffective.
Typically, big data is used for decisions around business operations, supply chain
management, vendor metrics or executive dashboards. When the data involved is
enormous big data can be applied for better business outcomes, here are three
examples:
1. UPS uses sensor data and big-data analytics on delivery vehicles to monitor speed,
miles per gallon, number of stops and engine health. The sensors collect over 200 data
points for each a fleet of 80,000 vehicles daily.
2. Macys before every sale prices are adjusted in practically real time for their 73 million
items on sale. Using SAS the reduced analytics cost by $500k.
260
of data. The goal is to improve flight and fuel efficiency and use predictive analytics to
fly safer planes leveraging predictive maintenance.
Each are helpful in their own right, but they are several degrees away from personally
affecting your health. Were simply not committed to understanding how to leverage
these process, tools and techniques into our business. Do you know of the big data
stories like King's Hawaiian or BC Hydro? Why not? The answer is they are not
personalized. So lets make it personal and explore big data analytics applied to
personal genome sequencing. This is about how big data can impact your personal
health and your lifespan. Interested?
The Human Genome Project, officially started in 1990 coordinated by the U.S
Department of Energy (DOE) and National Institutes of health (NIH, an agency of the
United States Department of Health and Human Services), in a 13-year initiative. The
goal of the project was to determine the sequence of chemical base pairs which make up
human DNA, and identify and map all of the genes of the human genome from both a
physical and functional standpoint. The initial lofty goal wasnt fully realized, but they
came very close when the project closed in 2003. This project still remains the worlds
largest collaborative biological project, at a cost of $3 billion; the project sequenced 90%
of the human genome (only euchromatic regions specifically). Shortly after this project
completed, personal genome sequencing gained rapid interest globally. Personal
genomics is the branch of genomics concerned with the sequencing and analysis of the
genome of an individual. Direct-to-consumer genome sequencing has been slow to gain
adoption, despite a number of firms providing this service globally: Australia
(Lumigenix), Beliguim (Gentle Labs), China (IDNA.com, Mygen23), Finland
(Geenitesti), Ireland (Geneplanet), and UK (Genotek) among others.
261
genomes doesnt exist to provide deep analytics. Similarly, the large database of
diseases doesnt exist to map genomes against. Both databases, the genome and the
disease, are required in order to draw correlations to probable individualistic health
outcomes. There are about 30,000 diseases known to man, with about 1/3 containing
effective treatments. The Center for Disease Control and Prevention sited that about
7,000 of those diseases are rare with about 20 new rare diseases being identified every
month. Connecting diseases and genetic variations has proven to be incredibly elusive
and complicated despite the positive advancements made under the human genome
project.
Everyone has access the latest advancements in medicine. All diseases are screenable.
Why arent these two statements true today? By applying big data and big data
analytics, these two statements could be true. Steve jobs believed. "Steve Jobs, co-
founder of Apple Inc., was one of the first 20 people in the world to have his DNA
sequenced, for which he paid $100,000. He also had the DNA of his cancer sequenced,
in the hope it would provide information about more appropriate treatments for him
and for other people with the same cancer (yourgenome.org, 2015)." Similar to his
journey with Apple Computer, Jobs was ahead of his time with genomics. Let's venture
on.
Writing and Rewriting DNA
Author Juan Enrquez, is a futurist and a visionary in the space of bioinformatics with
some intriguing ideas. The global pace of data generation is staggering and will
continue to continue along its exponential growth curve to 1.8 zettabytes (a zettabyte is
a trillion gigabytes; thats a 1 with 21 zeros trailing behind it). Where will all this data
and information reside? In bigger and bigger computers? No. The answer lies in
bioinformatics: in programmable bacteria cells. Bacteria is already being designed to
clean toxic waste and emit light. These cells are programmable like a computer chip and
changing how things we want are made and removing the boundaries of where things
are made; Exxon is attempting to program algae to generate gasoline, BP is working to
extract gas from coal, and Novartis is rapid-prototyping vaccines (Bonnet, Subsoontorn,
Endy, 2012). Whats next? Creating a cell to generate energy or produce plastics? Its a
fascinating space to explore. Think about the ideal storage eco-system. Juan Enrquez,
262
elaborates and said that an ounce of DNA can theoretically hold 300,000 terabytes of
data and survive intact for more than 1,000,000 years.
Anything you can store in a computer you can store in bacteria -- Juan Enrquez
This software makes its own hardware and operates on a nano scale. The future of big
data is a world where computers are designed, that could float on a speak of dust -- as
powerful as a laptop today -- life is so efficient it can copy itself (reproduce) and make
billions of copies. The global benefits of bioinformatics applied to healthcare outcomes
will be incredible. Understanding the human genome and personal genome sequencing
are the keys to unlocking this mystery (Enriquez, 2007).
Each individual has a unique genome and mapping the human genome involves
sequencing multiple variations of each gene. Genes are stretches of DNA that code a
protein for a specific bodily function and DNA contain all the genetic material that
determine traits such as hair color and eye color.
What does my DNA sequencing say about my future health? Can you predict life
expectancy based on my sequenced DNA? These are all important questions, however,
the human genome project wont provide this information. Before we can ascertain this
information, your personal DNA needs to be sequenced.
There are dozens of companies that provide personal genome sequencing in the US
including: Sequencing.com (software applications to analyze the data based on their
patent-pending Real-Time Personalization technology), The Genographic Project
(National Geographic Society and IBM to collect DNA samples to map historical human
migration patterns helping to create the direct-to-consumer (DTC) genetic testing
industry), The Personal Genome Project (PGP is a long term, large cohort study based at
Harvard Medical School which aims to sequence and publicize genomes and medical
records), and the list goes on SNPedia, deCODEme.com, Navigenics, Pathway
Genomicsanalyzes, 23andMe and others. They all provide personal sequencing of your
DNA.
263
Harvards Personal Genome Project
The Personal Genome Project (PGP) is a long term Harvard study which aims to
sequence and publicize the complete genomes and medical records of 100,000
volunteers, in order to enable research into personal genomics and personalized
medicine. After spending way too much time reading journals and research findings
published over the last five years, it gets one quite curious. How could knowing
my personal genome sequence improve my health outcomes? Are my days numbered
due to a potential future disease? This and many more questions caused a deeper
exploration into the Personal Genome Project consent form. While, the 24 page-form is
amazingly well written, it does proactively disclose several disturbing risks. Allow me
to share a few of the more interesting risks when volunteering to participate:
1. Non-anonymous your name and personal information is identifiable and available to
the global public; read no confidentiality
2. Cell lines created from your DNA may be genetically modified and/or by mixing
human and nonhuman cells in animals
3. Consent enables the ability to make synthetic DNA and plant it at a crime scene or
implicate you and/or a family member in a crime
5. Whether legal or not, affect the ability for you or a family member to obtain or maintain
employment, insurance or financial services
After reading the risk, it doesn't take long to grasp why adoption hasn't been prolific
over the last decade.
264
Is it worth it to have your personal genome sequenced when the volumes of data
required to provide deep analysis doesnt exist today? John D. Halamka, MD, MS is
Chief Information Officer of the Beth Israel Deaconess Medical Center, Chief
Information Officer and Dean for Technology at Harvard Medical School and weighs in
on this question. Dr. Halamka, in a November 2015 interview with athenaHealth said,
that based on his personal genome sequencing, he will die of prostate cancer. Dr.
Halamka, was also one of the first 10 people to have a personal genome sequence
completed, through Harvards Personal Genome Project. He mentions that the
recommended prostrate testing frequency for men is every 4-5 year, for the
population. He argued that for the general population thats fine, but for himself
because of his genome he would be wise to check yearly. This information wouldn't
have been available without personal genome sequencing. The below image provides a
good illustration of genome sequencing, combining sample (your personal genome
sequencing) with reference material (database of previously genome sequenced
individuals) to produce aggregated information that is specific to an individual's health.
He also provided an intriguing example explaining that when his wife was diagnosed
with breast cancer her personal genome was mapped. Her genome was compared to
the 10,000 other genomes able at the time, and from this information they determined
the best course of treatment based on her genes, given favorable outcomes of the
population samples.
265
Housing population genomes and disease inventories will consume huge amounts of
data. Data available today is already changing patient outcomes. Population genome
data and global disease inventories will accelerate amazing advancements in the
identification and treatment of disease.
Bioinformatics is the future of big data. As it becomes easier to write and rewrite in life
code, every business on earth will be changed.
The places that read and write life code are going to become the centers of the global
economic system; the greatest single database that humans have ever built. -- Juan
Enrquez
Bioinformatics will refine big data and society will eventually reach a tipping point
when personal health self-service hits the mainstream, patients will become the 'CEO of
their personal health.' When will conventional storage be obsolete? How will
information security change, when the coding is biological? As the population ages new
open source business models will develop, erupting community
development. Communities that are not just involved but committed! Passionate
communities that have blood in the game, because they are fighting for their life or that
of a loved family member.
Is it worth it to have your personal genome sequenced? Yes, its your life -- its worth it.
References
bigthink.com. (2010). Learning to Speak Life Code | Big Think. Retrieved November
23, 2015, from http://bigthink.com/videos/learning-to-speak-life-code
266
Bioengineers create rewritable digital data storage in DNA | KurzweilAI. (n.d.).
Retrieved November 23, 2015, from http://www.kurzweilai.net/bioengineers-create-
rewritable-digital-data-storage-in-dna
Bonnet, J., Subsoontorn, P., Endy, D., Rewritable digital data storage in live cells via
engineered control of recombination directionality, Proceedings of the National
Academy of Sciences, 2012 DOI: 10.1073/pnas.1202344109
Enriquez, J. (2007). Juan Enriquez: The life code that will reshape the future | TED Talk.
Retrieved November 23, 2015, from
https://www.ted.com/talks/juan_enriquez_on_genomics_and_our_future/transcript?lan
guage=en
Ross, A. (2015). Genome sequencing for just $1000 (online image). Retrieved November
22, 2015, from http://www.geeksnack.com/2015/10/05/genome-sequencing-for-just-1000/
Snyder, M., Du, J., & Gerstein, M. (2010). Personal genome sequencing: current
approaches and challenges. Retrieved November 22, 2015, from
http://genesdev.cshlp.org/content/24/5/423.full
What follows is an overview of what these critical elements entail and steps to
implementing a successful IoT solution the leverages them fully.
268
The IoT Data Journey From Data Collection & Analytics to Visualization & Control
Data is fluid and tends to be misunderstood in its raw form. The real challenge of the
IoT is that you have too many faucets running at the same time with different kinds
of fluid. At the collection point, dealing with data complexity and variation is extremely
critical. Without addressing that complexity early, its impossible to achieve the end
business result youre after.
Lets consider, for example, a typical commercial building and the data journey in that
environment. You are likely to come across different sub-systems from different
manufacturers; e.g. HVAC, elevators, security, power. The first step is to try and
normalize data from all these
sub-systems through a common data model and then focus on the data that is relevant
to the problem you are trying to solve.
269
In effective IoT
platforms, after
normalization,
the data is fed
into an
analytics
engine that
adds
intelligence to how data should be interpreted. The analytical engine is built out of rules
based on specific domain expertise and feeds into a dashboard that visualizes the
information necessary to take action. However, visualization in absence of action is not
of much help. Therefore, remediation is an important piece of the overall solution.
Typically, in IoT use cases, alarms would indicate that an action needs to be taken. But
somebody needs to press a button somewhere to make that action happen. The best IoT
platforms are designed to close that loop. They allow you not only to take manual
actions but help to automatically (or semi-automatically) remediate from when an
alarm is generated in as close to real-time as possible.
Although the value of analytics/visualization is huge in IoT, there are several barriers
that you need to understand and overcome while developing your solution.
270
Data acquisition is expensive
Domain expertise
To get the most from IoT, organizations must have team members with domain
expertise that are dedicated to solving problems and delivering on specific IoT goals.
Energy Officer is a relatively new title in many companies, but having such a person
ensures someone is focused on driving energy savings with your IoT solution.
ROI, while real, is slow to materialize. Ive seen this again and again when working
with IoT customers. In buildings, some customers have only seen significant benefit
when their IoT solution has extended to multiple sites. ROI depends on your business.
And it is something you should be prepared to be patient about.
As IoT is gaining momentum there are startups and established companies entering the
IoT arena with new platform, analytics and visualization technologies. While having
more options for products and services can be good, it can also be confusing and can
make it very difficult to select the right technology needed to build a strong IoT
analytics and visualization solution.
271
Selecting and Developing a Robust IoT Analytics & Visualization Solution
Below are a few tips to consider as you design your IoT solution. There are probably
several other considerations but for this post, I will outline those that I have seen
implemented over the past few years:
This is a hard one and takes multiple iterations before getting it right. Try to identify the
data you need and ensure that there is accuracy in the data collected. Additionally, the
data needs to be reliable and high performing. Most of the time, data will need to be
collected from multiple systems that are already installed.
If you know your goals and have an idea of what data you need, selecting the right
foundational technology for data collection and management is very important. There
are some key tenets in an IoT platform that you should be looking for:
Open Technology - so you can normalize data from legacy proprietary and new edge
devices, build applications and integrate with 3rd party systems as and when you need
without having to replace the platform or infrastructure. APIs play a critical role here
look for published open APIs for your developers.
272
Stable Technology if you have the choice, besides evaluating pros and cons of existing
vs new platforms in your labs, evaluate established real IoT operational case studies.
See how long these systems have been running and how customers have benefited over
multiple years. IoT systems should be designed for prolonged and sustained benefits.
Robust Eco-System you might want to conquer the world by building all applications
you need yourself but with Android and iOS, we all know the power of an application
ecosystem. You want to be able to have choice. Select a platform that has a developer
community around the technology.
Depending on your business, you might need real time data for mission critical
decisions OR just historical data for you to run periodic reports. Traditional methods of
analytics are not suitable for harnessing the IoT's enormous power. Using real-time
analytics at the edge (device level) in conjunction with historical trends analysis is very
important. In a recent video below, I talked about what makes data explosion such a
great opportunity for IoT.
Actionable Visualization
Flexibility and integration with analytics is very important in an IoT data visualization
solution. There are choices that range from well established/legacy enterprise class
Business Intelligence (BI) visualization tools that are able to deal with complex data and
new cloud based tools for complex and simple visualization of unstructured data. I like
visualization capabilities that are self-servicing so I dont have to wait forever for
someone to create a report. Also consider what your mobile users will need simplicity
273
is a big driver there. Visualization is all about how data is presented in a manner such
that appropriate action can be taken in time.
Once selected, installed and operational, you will need to continuously evaluate your
analytics & visualization solution and make changes as required.
Conclusion
Do whats best for you. There is no set formula and every business is different. Identify
the specific problem that you want to solve and build your solution around it.
274
5. The Private Eye of Open Data on a Random Walk: 8 different routes by
Debleena Roy
The Random Walk theory: The
Debleena Roy efficient markets hypothesis (EMH),
Director at Unitus Seed popularly known as the Random
Fund. Trainer, Writer and Walk Theory, is the proposition that
Singer
current stock prices fully reflect
LinkedIn Contact
available information about the
value of the firm, and there is no
Debleena has over 14 years experience in Strategy,
Analytics, Finance and Marketing with leading firms way to earn excess profits, (more
such as GE, Prudential, JP Morgan, Fidelity. Prior to than the market over all), by using
joining Unitus Seed Fund, she led Strategy and
this information.
Marketing at a leading Analytics start-up, BRIDGEi2i
Analytics Solutions.
The Private Eye of Open Data: The
I look for new ways of self-expression and learning.
Sometimes through a song, sometimes through a buzz around open
story, sometimes by teaching. In the words of the data notwithstanding, open data
great Einstein striving not to be a success, but
rather to be of value. I research and analyze, itself is not new. And it has been
probably a bit too much. I dream, I learn, I teach, I used since decades to analyse
learn again.
information on privately held
companies. We used to call
it secondary research in competitive intelligence parlance during pre-open-data days.
One possible application: Now, if we apply the above two concepts to the information
asymmetry in the exploding world of start-ups and investors in India, how does it play
out? Search for information on private companies was always a time-consuming and
complicated exercise with the lack of structured, available data. Dependence on
company news and referral networks used to provide hazy sketches of company
profiles that looked only like lost parts of an incomplete puzzle. In todays world, there
are a few companies, many of them start-up firms themselves, which are trying to
make some sense of this information asymmetry by using open data, web crawlers,
crowdsourced intelligence as well as good old fashioned news reporting.
275
Start-up firms looking at sizing up competition
Lets look at a few such firms and the information they are aiming to provide:
Type of information:
Free quarterly competitive intelligence report on the competitive set including revenue,
CEO face-offs, press mentions, blog posts, social media statistics
276
Special reports on funding, acquisition and leadership alerts
Risk: Since the data is dependent mostly on users via crowdsourcing, quality of
information around revenue, employees etc. could be indicative and not actual.
Mattermark:
About: Mattermark focuses on making deal discovery easier by using machine learning,
web crawlers, primary sources and natural language processing methods for data
discovery. They do charge for usage.
Information:
Provides flexibility of either using data or requesting for the API and customizing it.
Risk: Automated prospecting sounds like a sweet deal for investors who are wading
through a pile of entrepreneur information but the real power of data in automating
strategic decisions is still not proven.
Tracxn:
About: Started by ex-Venture Capitalists, Tracxn aims to make deal discovery easier
and become Gartner for start-ups by using data analysis techniques and intensive
research by their own team. They have free and premium access.
Information:
Sector trends including market landscape, key business models, active investors and
emerging companies
277
Custom due-diligence support
Risk: With their own incubation firm, Tracxn Labs, the ertwshile Investment Banking
Chinese Wall concept could become relevant to look at the business model matures.
Trak.in
About: Started off as an individual news and opinion blog, Trak.in today reports news
as well as deal data about start-ups. Information access is free.
Information:
Risks: While it is very easy to use, specially the listing of the key deals announced, with
the sheer volume of information currently available, the firm will need to ensure they
have scalable methods to access and update reliable information.
Venture Intelligence:
Background: Started off in the era where there were single digit start-up deals, Venture
Intelligence aims to be the leading provider of data and analysis on private companies
in India.
Information:
Deal intelligence
Listing of investors
Risk: With more competitors now claiming marketshare, veracity of information will
become a key differentiator for Venture Intelligence as well as others.
278
VC Circle:
About: Started off as a blog handled by a single person, VCCircle today has become one
of Indias largest digital platforms in the field of online business news, data, events and
trainings.
Information:
VCCircle Conferences
Start-up events
Risk: Yourstory has a lot of personal, textual information which is interesting to read.
Using the information for analysis and discovery is more difficult compared to the firms
using natural language processing and web crawlers.
Glassdoor:
About: last but not the least, Glassdoor provides a sneak peak into the culture of a new
firm. After all as Peter Drucker said, Culture eats Strategy for breakfast.
279
Method: crowdsourced information from employees and companies
Information:
The Bottomline:
The task of discovering, evaluating and monitoring private companies both as
competitors and as potential investments has never been more important than now
when India is seeing an explosion start-up firms and investment deals. And the
companies mentioned above (not an exhaustive list), with their focus on using open
data either for data discovery or dissemination will definitely add value to the
intelligence methods used by companies and investors like. But the questions remain-
As an Entrepreneur or an Investor, are you finding more such useful resources today?
Would love to know about them and keep updating the list
Will these firms be able to really use open data to provide information asymmetry or
just provide multiple views on information that has already been discovered by
someone where the price of the information has already been priced in, per Random
Walk theory?
280
6. Data Sheds New Light on the On-demand Economy by Alex Chriss
Whats been missing is a deep and objective understanding of those choosing on-
demand work.
To fill this void, the team at Intuit partnered with Emergent Research and 11 on-
demand economy and online talent marketplace companies to undertake a
groundbreaking examination of the aspirations, motivations and pain points of people
working in the on-demand economy.
281
The first set of data from the study puts the on-demand worker classification debate in a
distinctly new light.
Take for example, the fact that the average person working in the on-demand economy
spends a mere 12 hours per week working via their primary partner company,
generating 22 percent of their household income.
Our research also found that almost half (43 percent) of people working on-demand
jobs also have a traditional full or part-time job. In fact, only 5 percent of people
engaged in on-demand work indicate that it is their sole source of income.
Finally, our research shows that 70 percent of people working in the on-demand
economy are satisfied with their work. It follows that most of them (81 percent) say that
they will definitely continue their on-demand work in the coming year.
282
This data does not paint an obvious profile of people looking for traditional
employment. It points to a different motivation namely, flexible opportunities to make
more income.
With an increasing focus on objective data, the debate about how best to support people
who chose on-demand work will evolve. Indeed, just today a leading voice in the
debate, Senator Mark Warner urged policy makers to avoid rushing to judgment.
Instead, he said, on-demand companies themselves should be provided with the space
they need to innovate and experiment with new ways to support a new type of worker.
This is precisely the mindset that combined with a deep and objective understanding
of the aspirations, motivations and pain points of those choosing on-demand work
will set us on a path toward sustainable solutions.
283
In doing so, it will require a new mindset and most importantly, a deep and objective
understanding of the aspirations, motivations and pain points of those choosing to
work for themselves and on-demand.
284
7. Every Company is a Data Company by Christopher Lynch
Insurance, retail,
Christopher Lynch telecommunications, healthcare
Investment Professional at today, companies in any large
Accomplice industry have one thing in
LinkedIn Contact
common: they all generate
massive amounts of data every
day. And the applications are
Mr. Christopher P. Lynch, also known as Chris, was
a Partner at Atlas Venture L.P. Mr. Lynch was part startling from uncovering
of the investment team at Atlas Venture L.P. He is secrets of the genome to
focused on big data and disruptive infrastructure at
the firm. He is a Partner and General Partner at
optimizing agricultural output to
Accomplice. Prior to joining Atlas in 2012, he forecasting revenue to the dollar.
successfully drove growth and led several
Unlocking the power data will
companies to exit, generating billions of dollars in
proceeds. He is an Advisor and a Mentor to many change the world.
Boston areas entrepreneurs. He is the Founder of
Hack/Reduce According to IBM, every day we
create 2.5 quintillion bytes of data
so much that 90% of the data in
the world today has been created in the last two years alone. The challenge is for us to
extract value from it.
Simplicity is key. Tools that can process and analyze terabytes of data, and bubble up
the important information, will move the needle. Nutonian has an algorithm and
machine learning platform that can assess billions of models every second and surface
understandable predictive models. Data Robots founders can take any data science
challenge and figure out how to address it in a way that will bring maximum value;
theyve bottled up that knowledge in their software platform so others can do the same.
285
Its companies like these that are years ahead of others in making it easy to extract value
from big data and theyre having a powerful effect on their customers businesses.
Both Nutonian and Data Robot also contribute to my second point above, but at a
foundational level we need more people with data science skills. Organizations like
hack/reduce in Boston are creating communities around big data professionals,
companies and technologies to nurture the talent needed in a big data world.
DataCamp is the first interactive learning platform for data science technologies; it
challenges the user through an intelligent interface that guides them to the correct
answer. Data science may have been named the sexiest job of the 21st century, but the
hype has worn off and we need to get working on this.
Solving the security problem is critical. When you have all this data in one place, you
need to protect it. Security companies need to understand and leverage big data to
protect it. Sqrrl leverages technology built by its founders at the NSA to analyze
terabytes of disparate data, find connections and quickly build a linked data model to
the root cause of a cybersecurity attack. And, this data is spread across the cloud which
brings another level of exposure. Companies like Threat Stack are continuously
monitoring and learning in the cloud environment to give their customers a much-
needed extra layer of protection that is cloud native.
The road to unlocking big datas value is lined with challenges, but as we rise to those
challenges theres amazing opportunity in front of us. I cant wait to see what big data
will do - for humanity and for business. The future is bright.
286
8. Transforming Customer Relationships with Data by Rob Thomas
BUYING A HOUSE
Rob Thomas
A friend walked into a bank in a
Vice President, Product
Development, IBM Analytics small town in Connecticut. As
frequently portrayed in movies, the
LinkedIn Contact
benefit of living in a small town is
that you see many people that you
Rob Thomas is Vice President of Product know around town and often have
Development in IBM Analytics. He brings extensive
a first name relationship with local
experience in management, business development,
and consulting in the high technology and financial merchants. Its very personal and
services industries. He has worked extensively with something that many equate to the
global businesses and his background includes
experience in business and operational strategy, New England charm of a town like
high technology, acquisitions and divestitures, New Canaan. As this friend, let us
manufacturing operations, and product design and
call him Dan, entered the bank, it
development. Rob's first book, Big Data Revolution,
was recently published by Wiley was the normal greetings by name,
discussion of the recent town fair,
and a brief reflection on the
weekends Little League games.
Dan was in the market for a home. Having lived in the town for over ten years, he
wanted to upsize a bit, given that his family was now 20-percent larger than when he
bought the original home. After a few months of monitoring the real estate listings,
working with a local agent (whom he knew from his first home purchase), Dan and his
wife settled on the ideal house for their next home. Dans trip to the bank was all
business, as he needed a mortgage (much smaller than the one on his original home) to
finance the purchase of the new home.
The interaction started as you may expect: Dan, we need you to fill out some
paperwork for us and well be able to help you. Dan proceeded to write down
everything that the bank already knew about him: his name, address, Social Security
number, date of birth, employment history, previous mortgage experience, income
level, and estimated net worth. There was nothing unusual about the questions except
287
for the fact that the bank already knew everything they were asking about.
After he finished the paperwork, it shifted to an interview, and the bank representative
began to ask some qualitative questions about Dans situation and needs, and the
mortgage type that he was looking for. The ever-increasing number of choices varied
based on fixed versus variable interest rate, duration and amount of the loan, and other
factors.
Approximately 60 minutes later, Dan exited the bank, uncertain of whether or not he
would receive the loan. The bank knew Dan. The bank employees knew his wife and
children by name, and they had seen all of his deposits and withdrawals over a ten-year
period. Theyd seen him make all of his mortgage payments on time. Yet the bank refused
to acknowledge, through their actions, that they actually knew him.
There was an era when customer support and service was dictated by what you told the
person in front of you, whether that person was a storeowner, lender, or even an
automotive dealer. It was then up to that person to make a judgment on your issue and
either fix it or explain why it could not be fixed. That simpler time created a higher level
of personal touch in the process, but then the telephone came along. The phone led to
the emergence of call centers, which led to phone tree technology, which resulted in the
decline in customer service.
While technology has advanced exponentially since the 1800s, customer experience has
not advanced as dramatically. While customer interaction has been streamlined and
automated in many cases, it is debatable whether or not those cost-focused activities
have engendered customer loyalty, which should be the ultimate goal.
The following list identifies the main historical influences on customer service. Each era
has seen technological advances and along with that, enhanced interaction with
288
customers.
1876: The telephone is invented. While the telephone did not replace the face-to-face era
immediately, it laid the groundwork for a revolution that would continue until the next
major revolution: the Internet.
1890s: The telephone switchboard was invented. Originally, phones worked only point-
to-point, which is why phones were sold in pairs. The invention of the switchboard
opened up the ability to communicate one-to-many. This meant that customers could
dial a switchboard and then be directly connected to the merchant they purchased from
or to their local bank.
1960s: Call centers first emerged in the 1960s, primarily a product of larger companies
that saw a need to centralize a function to manage and solve customer inquiries. This
was more cost effective than previous approaches, and perhaps more importantly, it
enabled a company to train specialists to handle customer calls in a consistent manner.
Touch-tone dialing (1963) and 1-800 numbers (1967) fed the productivity and usage of
call centers.
1970s: Interactive Voice Response (IVR) technology was introduced into call centers to
assist with routing and to o er the promise of better problem resolution. Technology for
call routing and phone trees improved into the 1980s, but it is not something that ever
engendered a positive experience.
1980s: For the first time, companies began to outsource the call-center function. The
belief was that if you could pay someone else to offer this service and it would get done
289
at a lower price, then it was better. While this did not pick up steam until the 1990s, this
era marked the first big move to outsourcing, and particularly outsourcing overseas, to
lower- cost locations.
1990s to present: This era, marked by the emergence of the Internet, has seen the most
dramatic technology innovation, yet its debatable whether or not customer experience
has improved at a comparable pace. e Internet brought help desks, live chat support,
social media support, and the widespread use of customer relationship management
(CRM) and call-center software.
Despite all of this progress and developing technology through the years, it still seems
like something is missing. Even the personal, face-to-face channel (think about Dan and
his local bank) is unable to appropriately service a customer that the employees know
(but pretend not to, when it comes to making business decisions)
.
While we have seen considerable progress in customer support since the 1800s, the lack
of data in those times prevented the intimate customer experience that many longed for.
Its educational to explore a couple pre-data era examples of customer service, to
understand the strengths and limitations of customer service prior to the data era.
BOEING
The United States entered World War I on April 6, 1917. The U.S. Navy quickly became
interested in Boeings Model C seaplane. e seaplane was the rst all-Boeing design and
utilized either single or dual pontoons for water landing. e seaplane promised agility
and exibility, which were features that the Navy felt would be critical to managing the
highly complex environment of a war zone. Since Boeing conducted all of the testing of
the seaplane in Pensacola, Florida, this forced the company to deconstruct the planes,
ship them to the west coast of the United States (by rail). en, in the process, they had to
decide whether or not to send an engineer and pilot, along with the spare parts, in order
to ensure the customers success. This is the pinnacle of customer service: knowing your
customers, responding to their needs, and delivering what is required, where it is
290
required. Said another way, the purchase (or prospect of purchase) of the product
assumed customer service.
The Boeing Company and the Douglas Aircraft Company, which would later merge,
led the country in airplane innovation. As Boeing expanded after the war years, the
business grew to include much more than just manufacturing, with the advent of
airmail contracts and a commercial ight operation known as United Air Lines. Each of
these expansions led to more opportunities, namely around a training school, to
provide United Air Lines an endless supply of skilled pilots.
In 1936, Boeing founded its Service Unit. As you might expect, the first head of the unit
was an engineer (Wellwood Beall). After all, the mission of the unit was expertise, so a
top engineer was the right person for the job. As Boeing expanded overseas, Beall
decided he needed to establish a division focused on airplane maintenance and training
the Chinese, as China had emerged as a top growth area.
When World War II came along, Boeing quickly dedicated resources to training, spare
parts, and maintaining fleets in the conflict. A steady stream of Boeing and Douglas
field representatives began flowing to battlefronts on several continents to support their
companies respective aircraft. Boeing put field representatives on the front lines to
ensure that planes were operating and, equally importantly, to share information with
the company engineers regarding needed design improvement.
Based on lessons learned from its rst seven years in operation, the service unit
reorganized in 1943, around four areas:
-Maintenance publications
-Field service
-Training
-Spare parts
To this day, that structure is still substantially intact. Part of Boeings secret was a tight
relationship between customer service technicians and the design engineers. This
291
ensured that the Boeing product-development team was focused on the things that
mattered most to their clients.
Despite the major changes in airplane technology over the years, the customer-support
mission of Boeing has not wavered: To assist the operators of Boeing planes to the
greatest possible extent, delivering total satisfaction and lifetime support. While
customer service and the related technology has changed dramatically through the
years, the attributes of great customer service remains unchanged. We see many of
these attributes in the Boeing example:
3. Training: Similar to the goal with publications, training makes your clients smarter,
and therefore, they are less likely to have issues with the products or services provided.
4. Field service: Be where your clients are, helping them as its needed.
5. Spare parts: If applicable, provide extra capabilities or parts needed to achieve the
desired experience in the field.
6. Multi-channel: Establishing multiple channels enables the customer to ask for and
receive assistance.
8. Personalization: Know your customer and their needs, and personalize their
interaction and engagement.
292
Successful customer service entails each of these aspects in some capacity. The varied
forms of customer service depend largely on the industry and product, but also the role
that data can play.
FINANCIAL SERVICES
There are a multitude of reasons why a financial services firm would want to invest in a
call center: lower costs and consolidation; improved customer service, cross-selling, and
extended geographical reach.
Financial services have a unique need for call centers and expertise in customer service,
given that customer relationships are ultimately what they sell (the money is just a
vehicle towards achieving the customer relationship). Six of the most prominent areas
of financial services for call centers are:
1) Retail banking: Supporting savings and checking accounts, along with multiple
channels (online, branch, ATM, etc.)
3) Credit cards: Managing credit card balances, including disputes, limits, and
payments
6) Consumer lending: A secured or unsecured loan with fixed terms issued by a bank
or financing company. is includes mortgages, automobile loans, etc.
293
Consumer lending is perhaps the most interesting financial services area to explore
from the perspective of big data, as it involves more than just responding to customer
inquiries. It involves the decision to lend in the first place, which sets off all future
interactions with the consumer.
There are many types of lending that fall into the domain of consumer lending,
including credit cards, home equity loans, mortgages, and financing for cars,
appliances, and boats, among many other possible items, many of which are deemed to
have a finite life.
Ultimately, from the lenders perspective, the decision to lend or not to lend will be
based on the lenders belief that she will get paid back, with the appropriate amount of
interest.
Call volumes: Forecasting monthly, weekly, and hourly engagement Sta ng: Calibrating
on a monthly, weekly, and hourly basis, likely based on expected call volumes
Performance management: Setting standards for performance with the staff, knowing
that many situations will be unique
294
A survey of call center operations from 1997, conducted by Holliday, showed that 64
percent of the responding banks expected increased sales and cross sales, while only 48
percent saw an actual increase. Of the responding banks, 71 percent expected the call
center to increase customer retention; however, only 53 percent said that it actually did.
The current approach to utilizing call centers is not working and ironically, has not
changed much since 1997.
Data will transform customer service, as data can be the key ingredient in each of the
aspects of successful customer service. The lack of data or lack of use of data is
preventing the personalization of customer service, which is the reason that it is not
meeting expectations.
In the report, titled Navigate the Future Of Customer Service (Forrester, 2012), Kate
Leggett highlights key areas that depend on the successful utilization of big data. These
include: auditing the customer service ecosystem (technologies and processes
supported across different communication channels); using surveys to better
understand the needs of customers; and incorporating feedback loops by measuring the
success of customer service interactions against cost and satisfaction goals.
AN AUTOMOBILE MANUFACTURER
295
This situation can be defined as a data problem. More specifically, the fact that each
party had their own view of the problem in their own systems, which were not
integrated, contributed to the issue. As any one party went to look for similar issues (i.e.
queried the data), they received back only a limited view of the data available.
A logical solution to this problem is to enable the data to be searched across all parties
and data silos, and then reinterpreted into a single answer. The challenge with this
approach to using data is that it is very much a pull model, meaning that the person
searching for an answer has to know what question to ask. If you dont know the cause
of a problem, how can you possibly know what question to ask in order to fix it?
This problem necessitates data to be pushed from the disparate systems, based on the
role of the person exploring and based on the class of the problem. Once the data is
pushed to the customer service representatives, it transforms their role from question
takers to solution providers. They have the data they need to immediately suggest
solutions, options, or alternatives. All enabled by data.
ZENDESK
Mikkel Svane spent many years of his life implementing help-desk so ware. The
complaints from that experience were etched in his mind: Its difficult to use, its
expensive, it does not integrate easily with other systems, and its very hard to install.
This frustration led to the founding of Zendesk.
As of December 2013, it is widely believed that Zendesk has over 20,000 enterprise
clients. Zendesk was founded in 2007, and just seven short years later, it had a large
following. Why? In short, it found a way to leverage data to transform customer service.
Zendesk asserts that bad customer service costs major economies around the world
$338 billion annually. Even worse, they indicate that 82 percent of Americans report
having stopped doing business with a company because of poor customer service. In
296
the same vein as Boeing in World War II, this means that customer service is no longer
an element of customer satisfaction; it is perhaps the sole determinant of customer
satisfaction.
A simplistic description of Zendesk would highlight the fact that it is email, tweet,
phone, chat, and search data, all integrated in one place and personal- ized for the
customer of the moment. Mechanically, Zendesk is creating and tracking individual
customer support tickets for every interaction. The interaction can come in any form
(social media, email, phone, etc.) and therefore, any channel can kick off the creation of
a support ticket. As the support ticket is created, a priority level is assigned, any related
history is collated and attached, and it is routed to a specific customer-support person.
But, what about the people who dont call or tweet, yet still have an issue?
Zendesk has also released a search analytics capability, which is programmed using
sophisticated data modeling techniques to look for customer issues, instead of just
waiting for the customer to contact the company. A key part of the founding
philosophy of Zendesk was the realization that roughly 35 percent of consumers are
silent users, who seek their own answers, instead of contacting customer support. On
one hand, this is a great advantage for a company, as it reduces their cost of support.
But it is fraught with risk of customer satisfaction issues, as a customer may decide to
move to a competitor without the incumbent ever knowing they needed help.
Svane, like the executives at Boeing in the World War II era, sees customer service as a
means to build relationships with customers, as opposed to a hindrance. He believes
this perspective is starting to catch on more broadly. What has happened over the last
five or six years is that the notion of customer service has changed from just being this
call center to something where you can create real, meaningful long-term relationships
with your customers and think about it as a revenue center.
It would be very easy for Dan to receive a loan and for the bank to under- write that
loan if the right data was available to make the decision. With the right data, the bank
297
would know who he is, as well as his entire history with the bank, recent significant life
changes, credit behavior, and many other factors. This data would be pushed to the
bank representative as Dan walked in the door. When the representative asked, How
can I help you today? and learned that Dan was in the market for a new home, the
representative would simply say, Let me show you what options are available to you.
Dan could make a spot decision or choose to think about it, but either way, it would be
as simple as purchasing groceries. at is the power of data, transforming customer
service.
298
9. The Power of Apps to Leverage Big Data and Analytics by Joseph Bradley
299
Banjo aggregated
photos from several
social media
networks during
the launch of
SpaceXs Falcon 9
launch.
Banjo (www.ban.jo) Banjo instantly organizes the worlds social and digital signals by
location, giving users an unprecedented real-time view and level of understanding
about d whats happening anywhere in the world. According to Inc. Magazine, in its
article The Most Important Social Media Company Youve Never Heard Of, stated,
Banjo has created a way to find out anything thats happening in the world
instantly.
To get a sense of just how much data Banjo is collecting consider this. The company
runs a virtual grid of more than 35 billion squares that cover the face of the planet. Each
square is about the size of a football field. Banjos software maps every geo-located
public posts.
The companys software then goes to work analyzing the data by performing two
quadrillion calculations on the hundreds of thousands of posts that flood in every
minute. Deviations from the normal state trigger alerts.
300
A visualization of spikes in social media activity around the world.
With regards to privacy, Banjo only analyzes public posts. If you dont want to be
visible to Banjo, you can simply not post or turn your location setting to off. Banjo only
sees what people allow it to see.
Banjo has already proven itself in practice, being the first service (even before police and
the media) to learn about the Florida State University shootings and the Boston
Marathon bombing. Using the power of Big Data, analytics, and social media, Banjo has
become eyes and ears on the ground for anyone who uses their app.
As you consider how to leverage the power of Big Data and analytics in your sphere of
influence, its important that people, whether they are customers or citizens, should be
at the center of your plans. In an earlier post, I highlighted Ample Harvest, which
shows how a non-profit organization is leveraging Big Data and analytics to provide
fresh food to the worlds hungryfor free.
Also realize that success in a world that is going digital requires a new kind of leader.
Todays business and technology leaders need new skills and knowledge including: 1)
how to assemble and manage a team inside and outside the four walls of your
organization; 2) an understanding that digital languages (programming, platform, and
301
tools) are more important skills than being fluent in a cultural language; and 3)
believing that being inclusive and building diverse teams are both the correct and
profitable things to do.
In this article, I highlighted Banjo and Ample Harvest. There are many others. What are
your favorite apps that create value for people by leveraging Big Data and analytics?
302
10. State of Telecom Business Around Managing Big Data and Analytics by Dr.
Hossein Eslambolchi
303
revenue per user has been steadily coming down and the way to maintain margins is to
Explain further how the 3 C's work together inside a communications provider?
304
Cost is where database technologies like Rainstor come into play. If I can bring down
the storage cost and bring new capabilities on top of that, then margins can improve.
These new databases require far less storage because of their unique ability to compress
the data, which means reduced requirement for more servers and therefore less time
and money spent provisioning and managing them. You can save your company
hundreds of millions in a few years on your infrastructure alone. The goal is to lower
your costs as fast and effectively as you can, so that even if your revenues continue to
decline, the likelihood of gaining higher margins goes up. That means you can compete
faster and cheaper than anyone else in your industry.
Tell us more about cycle time and capabilities.
Cycle time is the amount of time it takes to provision a service. You want to reduce
cycle times, to speed time-to-market, which ultimately improves customer satisfaction
and by the way faster time to revenue. Every time you put a new server out there,
youve got to provision it or pay for the provisioning. Youve got to have your
maintenance systems running. There is no turnkey system for service providers,
because every network is different. So if youve got hundreds to thousands of servers,
that brings additional time and complexity. There is not a single routine for each server,
either. There may be various storage vendors behind the scenes and different
configuration requirements. But lets say you can reduce the number of servers to only
10 or 20, then you are drastically lowering cycle time, complexity and effort managing
this hardware footprint. And of course that lowers costs and likely improves your
compliance because you are able to store all of the data that you need and youre doing
it in a more controlled and less risky fashion.
305
occurring in the future and affecting customer service levels. There are so many
applications that a service provider could deploy on top of database technologies such
as RainStor to improve and monetize the network. If you can cost effectively scale and
you have an intelligent network, you can roll out services faster and more economically,
which translates to revenue-generating initiatives.
Service providers have to simplify and streamline their operations. They are drowning
because they are so big, and they are trying to manage enormous streams of data
coming from a wider range of disparate sources. But building a bunch of data centers,
the Google approach, is really not the right approach for the 21 century. Companies
dont have the money and time to build out this gargantuan infrastructure any longer
and they want payback on the investment in 12 to 18 months.
However, I believe first things first, you cant perform rich analysis and better reporting
unless you have the most efficient infrastructure in place to store, manage and scale all
the important data for the business. By addressing cost first, you can improve cycle
times and then begin to add more capabilities. Lets be honest, the most important C,
the Customer, is what ultimately matters when it comes to improving margins.
Without big data, you are blind and deaf and in the
middle of a freeway. Geoffrey Moore, author and
consultant.
306
11. Solution to Key Traffic Problem Required a Fresh Look at Existing Data by
Tony Belkin
A lot of people in the industry believed that to figure out which lane a vehicle was in,
wed need higher accuracy in probe data. And that this accuracy would only arrive
when most cars were equipped with advanced sensors linked to high-definition maps.
307
But at HERE, we invented a way past these challenges with the introduction of Split
Lane Traffic at junctions (SLT). Its ultimately the product of 20 years analyzing and
understanding how traffic works, and a unique perspective on the issues.
We started with the information we have got. We collect and process billions of road
traffic probes every day. Could we look at those probes differently, we asked, to learn
something new?
Inspired by thinking on how to handle divergent conditions, our team of data analysts
and traffic experts spent a year analyzing the data, then testing and refining
mathematical models to extract meaning from all these separate probes. The result was
SLT.
SLT uses two algorithms (patent pending) to isolate and measure the traffic speeds in
different lanes around junctions. Our Multi-Modal Speed Detection and Magnitude
(MDM) and the Dynamic SLT Aggregation (DSA) algorithms process raw probe data,
together with information from sensors and incident sources to create reliable, accurate
results.
The MDM algorithm looks at all the vehicle probes in the area of road leading up to a
junction and works out if they fall into separate groups, travelling at different speeds.
Then it works out what the level of difference is, and decides whether it is sufficiently
large to warrant publishing an SLT, splitting the roads reported speed into two before
308
that junction. The DSA algorithm works out how far down the road leading up to the
junction this difference in speeds extends, dividing the road into much finer segments
than was previously the case.
Even when we were confident that the technique worked, we couldnt just rush it to
market. Its a big and complex operation, drawing together probe and data collection,
data processing and product delivery. We didnt just have to ensure that we can get the
correct results, but also that we could integrate them into our products and deliver
them consistently to users. Millions of people and businesses depend on HERE for
accurate traffic information, delivered worldwide and updated every minute of every
day. Reliability is a fundamental principle.
Now, though, with all the preparation and testing complete, SLT will be a part of our
Real Time Traffic product, bringing greater accuracy to road speed reports, routing
advice and estimated times of arrival.
309
This valuable new feature really moves the whole industry forward, significantly
improving traffic information, without any new technology requirements from drivers
or roads. And its been a great learning opportunity for our teams, growing our
experience of reporting traffic at lane level.
To find out more about SLT, you should read our white paper which describes the work
in more detail.
310
12. Machines Won't Replace Insurance Agents in 2016, But They Will Do This by
Steve Anderson
In 2016, we will begin to see very practical applications of Machine Learning technology
as it moves away from just being theoretical.
Artificial intelligence, machine learning, and natural language processing and its
implications have captured my attention. In 2016, we will begin seeing this technology
move from the theoretical into practical applications. We will also see the cost
significantly drop so smaller organizations can take advantage of its capabilities.
311
IBMs Watson Platform
Ive been following with interest the development and growth of IBMs Watson
platform. IBM describes Watson as a cognitive system that can help us outthink our
biggest challenges.
Watson Wins Jeopardy
I first became aware of Watson in February 2011 because of the Watson Jeopardy
Challenge. It was fascinating to watch Watson play Jeopardy against Ken Jennings
(longest winning streak) and Brad Rutter (biggest money winner). Watson ultimately
won that challenge.
Watson seemed like an interesting experiment. Are machines better than people at
processing vast amounts of data in real time? Can they replicate (or maybe replace?) the
human decision-making process in certain circumstances?
The Watson platform has continued to develop. Its cognitive system is now available
for use by other industries through its API access. One of the biggest users of Watson is
for medical diagnostics. Because of its advanced image analytics capability, Watson can
now see medical images.
The insurance industry is also looking at how Watson might be able to improve the
underwriting process and financial results for an insurance company.
"It [IBM Watson] will do a whole lot of work and very quickly. There will be more
accurate underwriting, and to a certain extent, it probably removes some human
emotion in terms of decision making too," Mark Senkevics, head of Swiss Re Australia
and New Zealand.
In this article, Senkevics explains why the global reinsurer plans to engage IBM's
artificial intelligence system, IBM Watson, to assess life insurance risks in Australia.
Machine Learning Examples
In November, I attended the ACORD 2015 as a judge for the ACORD Innovation
Challenge. I had the opportunity to listen to Jared Cohens keynote presentation. Cohen
is the Director of Google Ideas and the co-author (along with Google executive
312
chairman and former CEO Eric Schmidt) of "The New Digital Age: Reshaping the
Future of People, Nations and Business."
Cohen spent a majority of his time talking about how Google is using machine learning
in many areas, including image processing (Google Photos) and Google Driverless Cars.
If he spent that much time talking about machine learning, then it just might be
something I should spend more time exploring.
Other recent news items about the advancement and development of machine learning
include:
How Banks Use Machine Learning to Know a Crook's Using Your Credit CardDetails.
Elon Musk Donates $10 Million to Keep AI Beneficial.
Machine Learning Works GreatMathematicians Just Dont Know Why.
Google open sourcing its artificial intelligence engine TensorFlow last month
freely sharing the code with the world at large.
The week Google made the announcement about TensorFlow, I was facilitating a
meeting of large U.S. insurance brokers in Houston. At dinner the first night I was
talking with an IT person from one of the brokers about TensorFlow and its possible
implication for the insurance industry. He told me he had already downloaded the code
that day and was playing around to see how it might be used in their operation.
People dont need you for information They need you for advice.
Terry Jones, CEO Kayak.com
Recently I spent the day at a small company that is developing an expert conversation
engine tool that will allow anyone to create a guided conversation. In artificial
intelligence terms, this type of process is properly called, "Forward Reasoning,"
sometimes also referred to as "forward chaining."
Using their platform, I was able to create an online conversation that created a guided
conversation that answered the question, I bought a new boat. Does my homeowners
313
policy cover it? Building the response to this question took about an hour for me to
complete.
Based on the answers provided, I was able to use this tool to capture my insurance
expertise (I have a Masters in Insurance Law) and display a customized page that
showed the coverage available and what was not available under several different
types of homeowners policy forms, including both physical damage and liability
coverage.
I was able to answer the question asked and also show where additional insurance
coverage was needed.
Once you learn how the tool works, anyone should be able to create simple
conversations like this in less than 30 minutes.
More complicated conversations (like an annual account review process) would take
longer to develop. The additional time required is due to the difficulty (at least initially)
of capturing the expertise required and understanding the logical flow of the
conversation.
However, once the guided conversation is created, it can be used many times by both
internal employees who dont have the experience as well as clients who have a
question.
314
Many people will look at these advances in machine learning as scary.
They will say that a machine can never do their job. A machine will never be able to
provide the same type of value that an actual person can. I am not so sure that is the
case.
Dont you think that is what Ken Jennings and Brad Rutter felt when competing with
Watson on Jeopardy? Yet, they lost.
Here are some ideas about how machine learning will benefit insurance agents and the
industry:
The ability to capture the knowledge, skills, and expertise from a generation of
insurance staff before they retire in the next 5 to 10 years.
A new way to answer questions about insurance issues from a new generation of
consumers. These consumers expect to be able to get answers anytime not just when
an agents office is open.
Provide consistent and correct answers to common insurance questions. You will
not need to monitor varying levels of expertise.
Be able to attract new talent by providing a career path that automates the mundane so
their time and effort can be spent on engaging clients at a deeper level that requires
more in-depth expertise.
Allows insurance agents to deliver their expertise to their clients more profitably. This is
especially true for individual insurance (personal lines) and small business insurance
(small commercial).
Allow agents to create and deliver an annual online account review process for both
personal lines and small commercial insurance accounts. This is a vital process for client
satisfaction, creating cross-selling opportunities, and reducing errors and omissions
problems.
315
Creates a way for every insurance agent to provide 24/7 access to insurance knowledge
and expertise. A guided conversation provides a much better experience than a simple
Frequently Asked Questions (FAQ) page. You will be able to create an interactive,
customized value-added experience.
There may be some who read this article and think artificial intelligence and machine
learning will be the end of the insurance agent as the trusted adviser for providing
proper protection against accidental losses.
Like Mark Twain, I think the death of the insurance agent has been greatly exaggerated.
Those insurance agents who embrace new technology and find better ways to engage
with the consumer will always find opportunity.
The Opportunity
Those insurance agents who embrace new technology and find better ways to engage
with the consumer will always find opportunity. For those agencies that can see the
opportunity, machine learning tools will provide another way to engage and interact
with the digital customer.
316
learning about machine learning technology to better understanding the implication
and the benefits to the insurance industry.
I do not believe this technology will replace the need for insurance agents. Todays
consumers demand value. They want to engage with people who provide products and
services. And they want it when they want it. Anytime, day or night.
For insurance agents who are simply order takers, machine learning will likely be a
threat. It will be much harder for them to justify why anyone should do business with
them.
Machines that can learn just might provide an edge insurance agents need to compete
effectively in a 24/7 world.
What do you think? Will you be able to talk to a computer like Captain Kirk did on
the Starship Enterprise? Don't agree? Let me know why.
317
13. How a Digital Transformation Can Improve Customer Experience with Big
Data by Ronald van Loon
To better understand what we are talking about, consider how much your life has
changed in the last twenty years. When was the last time you stopped to ask for
directions? How long has it been since you sat wondering about a piece of trivia, rather
318
than pulling out your smartphone and Googling it? How often do you decide to buy an
item online (using your smartphone or tablet) instead of driving to a store to get it?
How a Digital Transformation Can Improve Customer Experience with Big Data
Digital technology and, in particular, mobile tech has changed our lives drastically
in less than two decades. So why hasnt it changed your business? You can start to see
how a digital transformation is essential to keeping up in the digital era.
The first step, which nearly all companies today have mastered, is digital competence.
If your company is digitally competent, then you will have a website thats at least
marginally functional. It will tell customers where they can purchase goods or services,
319
but it will likely not be dynamic or responsive. Your site likely wont be mobile-friendly
or highly functional.
In the second step, digital literacy, your company will have active presences on social
media platforms, and youll have a responsive, mobile-friendly website. You will likely
also have some higher functionalities on your website, including an ecommerce store
and other features. Basically, youll have all of the tools to be fully digitally mature, but
you will not have taken the last step.
When you decide to dive into a digital transformation, your company will create a data
collection and analysis group, team, or department. Youll have people and structures
dedicated to digital marketing. Youll incorporate data science into your marketing
plans In short, you will have fully embraced big data and learned just how much you
can use it to improve customer experience and sales.
With this information woven into the framework of your business, youll look at your
website analytics and identify exactly where people get hung-up in the buying process.
Then you can find solutions to make the process flow more easily and quickly to reduce
abandoned carts and bounced traffic.
320
14. Doctor in The Smart Refrigerator by Arthur Olesch
Arthur has been working in a health care already The way we do our shopping and
for over 14 years now. As an editor in the Polish handle our food will definitely
Healthcare Journal he is responsible for
presenting the main eHealth trends and delivering change.
the knowledge about health care systems in
Poland, Europe and around the world. As the World Health Organization's
The change will be the result of the so-called the Internet of Things (IoT). To put it
simply, this means devices which surround us, are connected to the net and are able to
collect, process and exchange data. Firstly, the way we do our shopping and handle our
food will definitely change. Smart trolleys and baskets will not only calculate the total
price of items but also their energy value, and then record and send information on the
products and chemical preservatives. People suffering from allergies will be warned
about hazardous food additives and the devices will suggest what to buy to balance the
321
nutritional value of food. Electronic health accounts, supplemented with information on
diet, will help to select products suitable for particular illnesses and risk groups. A
similar role will also be fulfilled by high-tech fridges controlling expiry dates of food,
freshness of products, bacteriological norms, rates of consumption, etc. Smart devices
will work in the background, providing help and advice to all family members on an
individual basis. Checking eating habits via new technologies is already possible today
thanks to meal planning apps, those counting calories or controlling food allergies.
Their disadvantage is having to input all the data manually. Everyday objects,
integrated in a coherent network, can operate day and night. The same change will also
encompass physical activity, but in this case the progress seems to be faster that anyone
could have imagined. Special bands or watches measuring the number of steps made,
calories burned or the distance covered are a reality; the m-health market is expected to
develop rapidly and increase several dozen or even several hundred percent annually.
The word smart is used to describe the increasing number of items, like a smart knife,
a smart chair, smart glasses, smart shoes and a smart watch.
322
The more precise the control over life gets, the better the conditions to implement the
so-called principles of individualised prophylaxis. General recommendations will be
replaced by detailed guidelines available to an individual. Systems analysing medical
and social data will check interactions between medications and food, which in turn
shall increase the safety level of drug therapy.
Our flats will turn into small labs. During our morning routine a urine sample will be
tested, assessing the most important parameters. Of course, all information will be
automatically entered into our medical file. Other measurements on the basis of the
condition of skin, eyes, breath will be made by a smart mirror, displaying the most
important results for the user. Scientists from Europe are already working on such a
solution. Data will also be collected via clothes and everyday objects, such as furniture
323
and computers. Nowadays, designers outdo one another, combining traditional items
with electronics. A great example is, among others, a mug which controls the number of
calories consumed, the level of hydration of the body and the kind of beverage. The
word smart is used to describe the increasing number of items, like a smart knife, a
smart chair, smart glasses, smart shoes and a smart watch. And now a bed has also
joined this group of e-solutions. Packed with electronics, it will check the quality of
sleep and try to improve it by, for example, adjusting the temperature, lighting in the
room, humidity or initiating an appropriate relaxation program.
Why do we need such major interference of control mechanisms for better prophylaxis?
There exist several arguments. Firstly, the amount of information, not only concerning
health, is rapidly growing and we are no longer able to digest all of them on our own
nor update them on an ongoing basis and therefore, apply them. The same goes for
doctors who frequently use expert portals analysing and selecting scientific
contributions. Secondly, prophylaxis will never be a priority in itself. Even extensive
knowledge on health-related topics does not mean that it will be used in practice.
Technologies will enable the type of information to be adjusted to a particular person,
their lifestyle, illnesses, potential risks. Individualised dietetic programmes will take
into consideration lifestyle and culinary preferences, while, at the same time, trying not
to drastically change them. Thirdly, information on meals and lifestyle will be
correlated with data contained in the health account. In this way awareness of being in
an at risk group, having allergies and suffering from particular illnesses will be
available which may be used for developing personalized guidelines or specific
schedules encompassing dietary recommendations and exercise routines.
The promise of better health compels us to agree to more extensive control. Ultimately,
we will still have the freedom of choice, also in health-related matters. We are not
threatened by, at least not in the near future, a totalitarian prophylaxis. Nobody can
prevent us from buying a packet of crisps or a bottle of Coca-Cola. Taking into account
modern trends it can be said that we ourselves will be primarily interested in enjoying a
long and healthy life. Consumers will be willing to try out any novelties to help them
achieve this goal.
324
Technologies will enable the type of information to be adjusted to a particular person,
their lifestyle, illnesses, potential risks.
325
15. Wearables at the Workplace by Abhijit Bhaduri
The total shipments of wearables in 2014 was 19.6m. This year that number has jumped
to more than 45m and is expected to cross 125m in 2019. Wearables can be embedded in
wristwear (eg the watch or fitness band), in the clothing or shoes (already used by most
athletes to improve performance); in eyewear (think Google glass and contact lenses
that can track glucose levels for diabetics).
326
The creation of epidermal electronicswearable health and wellness sensors "printed"
directly onto the skin is changing how we track our heart rates at the gym and blood
pressure at the doctor's office. [i]
Disney uses magic bands at their resorts. If youre wearing your Disney MagicBand
and youve made a reservation, a host will greet you at the drawbridge and already
know your name. No matter where you sit, your server knows where you are and will
bring your order to your table magically. They are already getting customers to use
wearables to personalize the offerings and target ads and deals for products the
customer will find irresistible. Companies are already getting customers used to the
idea of wearables tracking them. This is a great example of using wearables to create an
immersive experience for customers.
Nordstrom wanted to learn more about its customers how many came through the
doors, how many were repeat visitors the kind of information that e-commerce sites
like Amazon have in spades. So in 2013 the company used technology that allowed it to
track customers movements by following the Wi-Fi signals from their smartphones.
The location sensing technology could track who was a repeat customer, the paths they
traced in the store, which items held their interest, and how long they paused and
hesitated before purchasing something. This data is what enables them to customize
everything from the mix of items on sale to the price points depending on the customers
that walk into the store. [ii]
Salesforce lists the many possibilities that wearables can throw up for enhancing the
customer experience, help design reward and loyalty programs, create a cashless
payment system that can be designed on the wearable, or have an integrated shopping
experience to name just a few possibilities. [iii]
327
Most employers already have a policy that allows employees to bring their own devices
to work. The Human Resources team, the Digital geeks and the Legal department will
soon have to sit down and plan how to handle a Bring Your Own Sensor policy. Can
having a company provided wearable be made to be a condition of employment?
Yes. According to Gartner, by 2018, two million employees will be required to wear
health and fitness tracking devices as a condition of employment The health and
fitness of people employed in jobs that can be dangerous or physically demanding will
increasingly be tracked by employers via wearable devices. [iv]
Tesco uses armbands to track goods transported to its aisles so that the worker does not
have to mark it on clipboards. It also helps estimate completion time and can even
provide data on fatigue. Boeing uses wearables to replace instruction manuals. The
workers can get instructions in real time.
A headband with a sensor can provide useful data that can give insights about EEG
patterns that can help us get insights on when we are feeling creative, productive or
plain bored. The employer could then decide to offer you work that suits your moods.
A fitness tracker can provide the employer real time data about activity levels and
fitness levels of the employees and help them negotiate better rates on health and
hospitalization insurance. Once the data exists then predictive models can warn the
employer about surges and dips in costs due to stress levels and increasing activity.
Simply tracking the employees productivity levels can provide objective data to decide
which employees are the most productive and should be given differentiated rewards.
More than half of human resources departments around the world report an increase in
the use of data analytics compared with three years ago, according to a recent survey by
the Economist Intelligence Unit. But many employees are still blissfully unaware of how
information they may deem private is being analyzed by their managers. [v]
328
The easiest spot to embed a sensor is the employee badge. Already organizations
routinely use it to provide access to different parts of the office. The employer can use
the sensors to activate or deactivate access. The badges monitor how employees move
around the workplace, who they talk to and in what tone of voice. How long they are
spending. Which bosses are spending time with the employees and then correlate the
data with productivity norms or employee engagement surveys to determine if that
time spent is resulting in something productive or not.
Tracking any data can lead to insights. Bank of America and Deloitte map office
behaviors to Sales, Revenue and Retention rates. Research done on wearables led the
Bank of America to discover that their most productive workers were those that
routinely shared tips and frustrations with their colleagues. That insight led them to
replace individual break times with collective downtime that was scheduled in chunks
of 15 minutes. The result was a 23% improvement in performance and a 19% drop in
stress levels.
Insights vs privacy
Sociometric Badges made by Humanyze can measure movement, face to face speech,
vocal intonation, who is talking to whom and for how long. We can understand what
these patterns suggest about the flow of information and power.
Once we track data, the genie is out of the box. Knowing health data can help negotiate
better insurance rates but can also open up possibilities of discrimination. Knowing
workplace interaction patterns can help employers identify influencers in the workplace
but also break up unions or troublemakers.
When data from the wearable is combined with social data the outcome may be scary.
The health tracker knows the location and your activity. When combined with social
data it can track who you are spending time with that you should not!!
What happens to the data when the employee leaves the employer? Who owns that
data? Who is to guarantee how that data will be used? What rights does the employee
329
have regarding their own data while they are in employment and when they are not?
Today these questions have not been addressed even for assessments, psychological
tests or background checks. In the workplace that has serious questions about
wearables generating data in real time without the employee knowing the possibilities
and consequences of what they are agreeing to.
References
[i] http://www.complex.com/pop-culture/2013/03/thin-health-tracking-sensors-can-be-
sprayed-onto-skin
330
[ii] http://www.nytimes.com/2013/07/15/business/attention-shopper-stores-are-tracking-
your-cell.html
[iii] http://investor.salesforce.com/about-us/investor/investor-news/investor-news-
details/2015/Wearables-in-the-Enterprise-are-Driving-Improved-Business-
Performance/default.aspx
[iv] http://solutions-review.com/mobile-device-management/by-2018-employees-will-
be-required-to-wear-wearables/
[v] http://www.ft.com/intl/cms/s/2/d56004b0-9581-11e3-9fd6-
00144feab7de.html#axzz3r9lHImvY
331
16. The SEAMLESS Customer EXPERIENCE by Justice Honaman
Making it even more challenging for manufacturers, this process involves product lines
across the company supported by departments and systems ranging from marketing
and technology to customer service and product development - all of which are
standalone and not integrated.
332
Traditionally, CG manufacturers focused their brand marketing and consumer
interaction on mass media. In the age of the connected consumer, however, thats no
longer enough. To fully engage consumers in the age of connected shopping,
manufacturers must capture and analyze data across a variety of sources and optimize
interactions across a multitude of communications channels.
Its critical to recognize that consumers now research and discuss brands amongst
themselves with or without manufacturer participation, establishing brand meaning
and value independent of ad agencies, campaigns, and mass media. This creates a
vastly more complex consumer path-to-purchase.
Historically, the process centered on three simple steps: See an advertisement -> Visit
the nearest store -> Buy the product if it is in stock
In the digital world where shopping begins on a smartphone or connected device: See
an advertisement -> Check out online reviews -> Poll friends through social media ->
Compare features among similar products at brand Web sites -> Check prices online at
retailer Web sites -> Search for coupons or promotions -> Buy a product online or in
store
Fortunately, emerging big data tools and techniques make it possible to collect, track,
analyze, and optimize the huge amount of structured and unstructured information
created by these new relationships. Access to detailed consumer data can be a huge
asset for manufacturing companies, but understanding and optimizing the data and the
complex communications remains challenging.
333
and one-time promotions) cant keep up, and most manufacturers dont have the
technical capabilities to efficiently capture, integrate, and use these integrated consumer
insights to gain a complete view of their consumers.
Thats why manufacturers are struggling to establish scalable and personal connections
with consumers based on accrued insights from all relevant data sources. And its why
a leading analyst firm estimates that by 2017, marketing executives will spend more on
technology than will technology executives.
Three-Step SOLUTION
1. Be aware of the online and offline data sources that affect a manufacturers
brands and products. That includes individual identity data, behavioral data
(location, purchase history, call-center transcripts, etc.), derived data (credit
scores, personas, influence scores, etc.), and self-identified data (purchase intent,
social media likes, user-generated content, etc.).
2. Analyze gathered information with big data techniques and tools. Forrester
Research says more than 45 percent of current big data deployments are for
marketing, and marketers are expected to spend 60 percent more on analytics
solutions in the next three years. The goal is to understand how the different
channels interact and then put it all together to build an accurate and complete
picture of current consumer behavior.
3. Drive action from the data analysis. With data-driven marketing, manufacturers
can join the conversation when consumers talk about their brands, their
products, and industries. These kinds of personalized dialogs can help capture
consumer mindshare, spurring them to action and converting them into loyal
shoppers and brand champions. More broadly, it enables making coordinated
334
business decisions to boost marketing effectiveness, customer satisfaction, and,
ultimately, sales.
Getting on that list starts well before the first moment of truth--the instant when a
shopper traditionally makes his or her purchase while standing in the store aisle. Savvy
manufacturers who want to get on those lists need to own what Google calls the zero
moment of truth, when consumers make their choices online before venturing out to the
store. So theyre adding email campaigns, brand Web sites, text-message promotions,
mobile applications, and social media to marketing mainstays such as coupons,
packaging, shelf position, end-caps, freestanding inserts, and television and print
advertising.
But making all that work at scale requires a unified, consumer-centric approach to
creating and nurturing individual relationships. That means executing dialog strategies,
not just sending out isolated mailings. Optimizing a dialog strategy requires
coordinating all the touch points, as follows, to create and send personalized messaging
that accounts for multiple consumer situations and responses:
By capturing and analyzing the millions of consumer interactions that would not
otherwise become part of the manufacturers institutional knowledge, manufacturers
335
now have the opportunity to take advantage of game-changing integrated consumer
insights to affect individual consumer outcomes.
336
17. Big Data Revolution in the Banking Industry by Marc Torrens
In order to fully exploit the value of all the banking data and transform it into
knowledge, banks need to organise their IT infrastructure in a new way to have an
holistic and workable view on all that data. Big Data platforms is certainly the way to
go as we discuss in this article.
337
and security. However, nowadays large IT banking systems have important challenges
to actually benefit from the potential value of the data they host. There are different
reasons for that, namely:
1) Banks have been growing by acquisitions that require to merge different IT systems
to seamlessly work together. Large investments have been devoted to those migration
IT projects that are necessary and very costly in terms of time, resources, and money.
Banking systems have been patched through the decades, so it becomes very
challenging to exploit an holistic view of all the data they host in a efficient way.
2) All the data that banks have is spread out in different IT systems usually provided by
different technology vendors. In the banking industry, it is very common to have one IT
system for cards, one for mortgages, one for loans, one for business intelligence, one for
fraud detection, and so forth. As a consequence, it becomes very cumbersome to cross
data from all those subsystems to have the holistic view on the data that is needed.
3) Since banks have been developing these IT systems for decades, they are based on
traditional and proprietary relational databases which also challenges to cross data
among the different systems efficiently from a technology point of view.
For example, for online banking services, banks create databases that host the necessary
data to be displayed on those channels with ETL processes that load the necessary data
from each of the sub-systems to the online database, creating yet another database to
maintain. This can be seen as an attempt to have a database with all the transactional
data, but in fact, it is just a copy of only the data that is strictly necessary for the
online services.
338
platforms. The idea is to push data from all the sub-systems in the IT infrastructure to a
new big data layer that will be collecting all the generated data.
The main motivation to move to those platforms is not actually the volume of the data
but the flexibility it gives to apply sophisticated algorithms crossing data coming from
different sources. If we think in terms of the giant internet companies (Facebook,
Google, Twitter, Amazon, etc), the data a bank is hosting is not very large. However,
having all the data available in a single platform is the key to apply techniques such as
Machine Learning and be able to transform that data into knowledge. Some of the key
advantages of having banking data in a single big data platform instead of in several
relational databases are horizontal scalability, distributed processing, faster data
throughput, cost efficiency, simpler data schema evolution, and simple and
standard and open protocols.
- Product Scoring that models the likelihood of a customer to acquire a new product.
This insight can be computed using Machine Learning. The model is based on the
parameters from customers that previously acquired a financial product. Similarly,
attrition rates can be predicted for bank customers.
339
These knowledge insights and others will definitely lead to new business opportunities
for banks allowing them to better serve customer financial needs.
The transactional data banks are collecting from millions of customers and the new data
revolution are the necessary ingredients to disrupt the products and services banks are
currently offering. This disruption will shift banks from their current position of
vendors of financial products to providers of financial solutions.
For example, a customer that wants to buy his dream car has thousands of products
available to help him save or borrow money to attain that specific goal. But, the
customer would probably like to have a plan to achieve the goal with the right
products. So, instead of offering customers a large catalog of financial products
to choose from, banks could offer personalized plans to meet their specific
needs. Financial products should serve as tools that facilitate everyday financial
situations. Banks will start offering solutions to life events and situations instead of just
the financial tools to overcome them.
Banks are actually very well positioned to do precisely this; they can develop
technologies that offer financial solutions instead of a catalog of products by learning
from the data they have been collecting for decades. The challenge is to extract meaning
from that data and use it to offer the right products at the right time to the
right customer. This new knowledge should encode preferences and tastes
from customers in a very precise way. These actionable insights can then be used
to propose personalized financial solutions to specific customer challenges.
So far we have seen how banks are in a uniquely privileged position to take the
financial industry where it is never been. However, the outlook is not all sunshine
and roses banks are under a lot of pressure to do this before the giant
Internet companies beat them to it. Moreover, banks are facing a wave of
340
unprecedented competition from an emerging set of pure FinTech players. Large
Internet companies and FinTech firms which have been leading the data revolution
in other sectors and starting to enter the financial world. Big banks are very well aware
of this threat and are starting to take the data revolution seriously. A notable example
comes from Francisco Gonzalez, chief executive of BBVA, who stated in an opinion
article entitled Banks need to take on Amazon and Google or die published in the
Financial Times (http://on.ft.com/18VjR9d) in 2013.
The question that arises is then: how exactly can banks innovate to compete with these
new players (Internet companies and FinTech startups) by exploiting their unique data
advantage?
Living in a world of interconnected systems in which real value comes more and more
from data itself, banks must open up and offer well-structured and secure APIs on top
of which third-party innovators can develop the next generation of financial solutions.
This is not a new approach in the technology sector; it has been widely used by the
largest technology companies. For example, Apple through their Apple Store ecosystem
opened their platform to allow third party companies innovate on top of their operating
systems and devices. This approach makes Apple platforms much stronger and more
powerful even though some of the innovation does not come directly from Apple
but from other companies and startups. Google is also innovating by opening up
services and APIs to other companies.
Banks should follow suit and start offering well-structured and secure APIs to open up
their data and services to innovative developers. This approach requires middleware
applications to interact with the core banking systems to enable developers to access the
banks services and data through robust APIs. This middleware layer would also
involve a simplification of some of the IT infrastructure that currently hinders
innovation within banks. With this approach, third party companies could innovate
new financial services while allowing banks to maintain their core business.
341
The FinTech application development community is endlessly creative and agile. Data
APIs have the potential to enable tech companies, banks and their customers to benefit
from an increasingly valuable ecosystem of innovative solutions.
This new approach not only has great potential to generate new revenue streams, but
more importantly enables banks to actually lead the imminent disruption of their
industry.
Data science applied to financial transactional data that banks hold opens up new
avenues for innovation and will form the basis of the future of FinTech. The key to
realizing this innovation is extracting knowledge from data that can be used in new
applications, having basically encoded customer needs and preferences to generate
actionable insights. Based on this new knowledge, machine learning technologies such
as collaborative filtering will be able to offer customers precisely what they need at the
moment they need it. Moreover, this new financial ecosystem will be able to offer
more personalized solutions based on products already offered by banks.
These solutions will enable consumers to achieve their specific goals with
financial plans tailored to their precise needs and behaviors.
This proposed new ecosystem in which banks focus on their core business and allow
technology companies to develop new financial services on top of bank data and
services will undoubtedly lead to new business models involving the three
parties: banks, tech companies, and customers. Although this scenario does not
consider banks driving innovation on their own, they will still hold the fuel that powers
this innovation ecosystem: data and services.
It makes sense to think about delivering fast results, in a limited area, that excites
important stakeholders and gains support and funding for more predictive projects. A
great goal.
342
18. Optimizing the Netflix Streaming Experience with Data Science by Nirmal
Govind
On January 16, 2007, Netflix started
Nirmal Govind rolling out a new feature: members
Director, Netflix could now stream movies directly
LinkedIn Contact on their browser without having to
wait for the red envelope in the
mail. This event marked a
substantial shift for Netflix and the
Nirmal Govind is the Director of Streaming Science
and Algorithms at Netflix. He leads a team that entertainment industry. A lot has
develops predictive models, algorithms, and changed since then. Today, Netflix
experimentation techniques for optimizing the Netflix
streaming experience. Before Netflix, Nirmal was
delivers over 1 billion hours of
CTO at a healthcare analytics startup that develops streaming per month to 48 million
web-based software for scheduling doctors.
members in more than 40 countries.
Previously, he worked on improving the efficiency of
semiconductor manufacturing factories at Intel with And Netflix accounts for more than
automated production scheduling. Nirmal received a third of peak Internet traffic in the
Ph.D. and M.S. degrees in Industrial Engineering
and Operations Research from Penn State and UC US. This level of engagement results
Berkeley respectively, and a B. Tech in Mechanical in a humungous amount of data.
Engineering from the Indian Institute of Technology,
Madras. At Netflix, we use big data for deep
analysis and predictive algorithms
to help provide the best experience
for our members. A well-known example of this is the personalized movie and show
recommendations that are tailored to each member's tastes. The Netflix prize that was
launched in 2007 highlighted Netflix's focus on recommendations. Another area that
we're focusing on is the streaming quality of experience (QoE), which refers to the user
experience once the member hits play on Netflix. This is an area that benefits
significantly from data science and algorithms/models built around big data.
343
more focus on "streaming science," we've created a new team at Netflix that's working
on innovative approaches for using our data to improve QoE. In this post, I will briefly
outline the types of problems we're solving, which include:
User behavior refers to the way users interact with the Netflix service, and we use our
data to both understand and predict behavior. For example, how would a change to our
product affect the number of hours that members watch? To improve the streaming
experience, we look at QoE metrics that are likely to have an impact on user behavior.
One metric of interest is the rebuffer rate, which is a measure of how often playback is
temporarily interrupted while more data is downloaded from the server to replenish
the local buffer on the client device. Another metric, bitrate, refers to the quality of the
picture that is served/seen - a very low bitrate corresponds to a fuzzy picture. There is
an interesting relationship between rebuffer rate and bitrate. Since network capacity is
limited, picking too high of a bitrate increases the risk of hitting the capacity limit,
running out of data in the local buffer, and then pausing playback to refill the buffer.
Whats the right tradeoff?
There are many more metrics that can be used to characterize QoE, but the impact that
each one has on user behavior, and the tradeoffs between the metrics need to be better
understood. More technically, we need to determine a mapping function that can
quantify and predict how changes in QoE metrics affect user behavior. Why is this
important? Understanding the impact of QoE on user behavior allows us to tailor the
algorithms that determine QoE and improve aspects that have significant impact on our
members' viewing and enjoyment.
344
The Netflix Streaming Supply Chain: opportunities to optimize the streaming
experience exist at multiple points
How do we use data to provide the best user experience once a member hits play on
Netflix?
One approach is to look at the algorithms that run in real-time or near real-time once
playback has started, which determine what bitrate should be served, what server to
download that content from, etc.
With vast amounts of data, the mapping function discussed above can be used to
further improve the experience for our members at the aggregate level, and even
personalize the streaming experience based on what the function might look like based
on each member's "QoE preference." Personalization can also be based on a member's
network characteristics, device, location, etc. For example, a member with a high-
bandwidth connection on a home network could have very different expectations and
experience compared to a member with low bandwidth on a mobile device on a cellular
network.
A set of big data problems also exists on the content delivery side. Open Connect is
Netflix's own content delivery network that allows ISPs to directly connect to Netflix
servers at common internet exchanges, or place a Netflix-provided storage appliance
(cache) with Netflix content on it at ISP locations. The key idea here is to locate the
content closer (in terms of network hops) to our members to provide a great experience.
345
One of several interesting problems here is to optimize decisions around content
caching on these appliances based on the viewing behavior of the members served.
With millions of members, a large catalog, and limited storage capacity, how should the
content be cached to ensure that when a member plays a particular movie or show, it is
being served out of the local cache/appliance?
In addition to the internal quality checks, we also receive feedback from our members
when they discover issues while viewing. This data can be very noisy and may contain
non-issues, issues that are not content quality related (for example, network errors
encountered due to a poor connection), or general feedback about member tastes and
preferences. In essence, identifying issues that are truly content quality related amounts
to finding the proverbial needle in a haystack.
346
These are just a few examples of ways in which we can use data in creative ways to
build models and algorithms that can deliver the perfect viewing experience for each
member. There are plenty of other challenges in the streaming space that can benefit
from a data science approach. If you're interested in working in this exciting space,
please check out the Streaming Science & Algorithms position on the Netflix jobs site.
347
19. Big Data and Innovation by Daniel Harple
Daniel Harple
Accelerating Innovation
4The Success of Monitoring the Economy With Big Data | Jon Hartley,
http://www.huffingtonpost.com/jon-hartley/the-success-of-monitoring_b_6875126.html.
348
the world, providing a real-time empirical analysis of core inflation, by geography.
Obama Council of Economic Advisors Chairman Alan Krueger has previously stated that
"There's still a Cold War element to our statistics", in that the methods and indices created
are by and large unchanged since the beginning of the post-war period.5 The work being
done at Context Labs is similar to BPP, yet focused more upstream in the supply chain of
the economy, innovative capacity. Further, the Context Labs focus seeks to combine big
data with complex network graph science to more deeply describe the stock and flows6
of innovation components that drive successful innovative outcomes, such as new
companies, ventures funded, new jobs, etc.
By-products of a new big data analytics for innovation would include: faster time to
market on new technologies crucial to our future, e.g., energy, biotech, medicine; better
targeting of resources and investments into locations with strong innovation cluster
dynamics; and in a more futuristic sense, understanding innovative flow globally.
Where are the new pockets of innovation emerging? How can we discover them faster?
Deploy resources to them faster? Produce products and services that best serve the
nation and the world in the 21st century? We have developed a new field we call
Innovation Dynamics to address these questions and challenges.
Innovation Dynamics
This new data-driven discipline, based on big data analytics is called Innovation
Dynamics. It is based on work by the Context Labs team, the development of the MIT
Sloan Regional Entrepreneurial Acceleration Lab program (REAL)7, collaborations with
MITs Media Lab, and the development of big data/network graph Pentalytics. 8
Pentalytics is at the intersection of big data real time sensing, network graph science, and
analytics. It utilizes descriptive and predictive algorithms to describe innovation clusters
and ecosystems in new ways.
5Sudeep Reddy, Real-Time Economic Data Could Be a Game Changer - WSJ, Wall Street Journal,
October 15, 2013, sec. Tech,
http://www.wsj.com/articles/SB10001424052702304330904579135832100578954.
6 Jay W. Forrester et al., MIT System Dynamics Group Literature Collection, 2003.
7The REAL program at MIT Sloan was founded by Dan Harple in 2012. http://real.mit.edu/
8Harple, Daniel L. Toward a Network Graph-Based Innovation Cluster Density Index, Massachusetts
Institute of Technology, Sloan School of Management, 2013.
349
The Pentalytics reference implementation is accomplished with a platform called
InnovationScope(iScope). Innovation Dynamics, using iScope, applies a data-driven
sensibility and vocabulary to better understand innovation and its contributing factors.
The iScope platform enables stakeholders to contribute and run analytics models on their
own, as well. The reference implementation for iScope 1.0 represents a large US-based
dataset.9 Primary Pentalytic Elements
Location
Pentalytics uses algorithms and Academia Industry
network graph theory to describe
innovation by producing
ecosystem graphs, with entities
(nodes) that interact (described
by edges connecting the nodes).
With this approach, we can
generate analytics for innovation
for jobs that relate local and
global conditions. We can
measure the interaction between
local actions and global trends. People Funding
Early stage utility for iScope has been varied, but includes use: as an economic innovation
cluster modeling and tracking tool, an innovation lens on a given sector or geography,
and as a tool for urban innovation mapping.
9 The Pentalytics sample size for the iScope 1.0 analysis in this chapter was 7,592 U.S.-based startups, a total
of 908,007 people. A filter on software developers further downsized the sample to 54,500 people; 856
Boston-based job titles, 1,172 Silicon Valley job titles; 991 Boston-based firms, 1,468 Silicon Valley-based
firms; 108 Boston-based industry segments, 131 Silicon Valley-based industry segments; 736 supply-chain
universities to Boston-based firms, 1,500 supply-chain universities to Silicon Valley-based firms.
Additional credit to Context Labs, specifically its Pentalytics R&D team: Daan Archer, Lee Harple, Gavin
Nicol, Victor Kashirin.
350
innovation and job creation. Innovation Dynamics is becoming a potentially predictive
tool for network resilience and failure, to help better navigate decisions related to the
growth of innovation clusters and/or the linking of remote clusters for a virtual cluster,
to help make decisions for: resource allocations, partnership and contractual targets,
angel and venture funding strategies.
Innovation Everywhere
Innovation is everywhere; it manifests itself in the context of its cluster density and
innovative capacity. This new data-driven view from Innovation Dynamics yields a range
of new network science-driven metrics for regions and cities. Each place has a series of
iScope indices for the market segment it supports. For example, Silicon Valley has a
different index set for Internet, SaaS firms, semiconductors, social, etc., with its
supporting infrastructure (universities, firms, workforce, policies, sources of funding).
And, each city around the world does as well. Using the Pentalytics dataset model, we
now have a common and robust dataset, suitable for comparisons, interventions,
prescriptions, and predictions. These new indices can help us decipher how and why
different locations are more or less suitable for new innovation models.
Innovation Dynamics can be used for analyzing innovation ecosystem resilience and
failure. It can help make decisions for resource allocations, partnership and contractual
targets, angel and venture funding strategies, and so on.
The present work we are undertaking can deepen the understanding of causality, enable
diagnostics, and point toward prescriptive measures. Our aim is to provide diagnostics
for innovation cluster dynamics and economic expansion.
It is clear that the network effects in Silicon Valley have worked so well that, to many
people, innovation equates to Silicon Valley. In reality, this is not entirely the case. When
you look more deeply you find that much of the innovation from Silicon Valley is
351
regularly outsourced to various geographies: India, Eastern Europe, etc. Apples
Designed by Apple in California products are truly enabled by leveraging the
advanced manufacturing capabilities delivered by Foxconn in Shenzhen, China. This
decoupling of cluster attributes is truly global, making it ever more critical for leaders to
understand the global inter-connectedness of their regions Innovation Dynamics
attributes. Innovation Dynamics is a tool to assist Silicon Valley maintains its global lead,
and avoids disruptions by other regions.
iScope outputs produce a variety of insights to help one better describe and predict
innovation ecosystems. Its outputs include urban innovation mappings, job
requirements graphs, employee density maps by discipline, cluster heat maps, etc. iScope
can compare ecosystems visually, for example, a biotech cluster in Cambridge with one
in San Francisco.
The following graphics depict early examples of iScope Analytics and provide a snapshot
of the utility and versatility of this new technology tool.
352
InnovationScope (iScope) enables comparative geographical visualizations illustrating a
wide range of factors: investment network flow, industry segment flow, innovative
viscosity, etc. Note that unlike most map-based visualizations, Pentalytics, driven by
network science methods, provides insights into flow between elements (edges), and not
just the elements (nodes). Figure 1 focus on Cambridge and MIT.
IScope enables visual side-by-side comparisons that display heat maps describing
company sectors, employee resources, and amenities presented as geographical
adjacencies. This informs the relationship between local academic institutions and
technology sectorse.g., Cambridge vs. Amsterdam.
353
The example below provides a side-by-side visualization examining the proximity
relationships anchored by the pentalytic component of Academia, illustrating the
adjacent innovations and industry segment firms surrounds an academic geolocation.
This example considers Cambridge on the left, showing Harvard and MIT primarily, and
Amsterdam on the right, focusing on the University of Amsterdam. A variety of side-by-
side comparisons can be visualized. For example, one could consider the people
resources by skill type in evaluating an industry, e.g., Biotech, and comparing this
element from one geolocation to another. This means if one were to consider funding
into the acceleration of a strategic Biotech cluster or region, the sources of talent would
be now visualized as key indicators of success. In this regard, all Pentalytics attributes
and variables can be packaged into unique visualizations, dependent on the desired
insight. Further, the toolset can be leveraged to also inform where new locations exist
which needs a specific type of skill. In this view, gaps in a cluster can then be viewed as
new opportunities for job creation, and inform the movement of talent from one location
to another.
354
As new companies are formed, clearly, new jobs are formed. With that, a key premise of
innovation is to accelerate the creation of new firms. A catalyst for the founding of new
firms is the acceleration of the development of innovative founders, their skill sets, their
backgrounds, optimizing in their locations. Our work has preliminarily given us some
inputs on where founders come from, and what specific innovation everywhere
location attributes spawn them.
Figure 3 illustrates the industry flow for individuals in the Silicon Valley/San Francisco
Internet cluster who identify themselves as founders. Note that the top three sectors
Internet, Computer Software, and IT/Servicesare on both the From and To
sides of the graph. Interestingly, the next supply-side industry for a founder in this
innovation geo-cluster comes from Writing and Editing, Semiconductors, Marketing and
Advertising, etc. The nodes in the graph are sorted by magnitude of job titles. This insight
also makes a case for a liberal arts education, as the ability to write and edit is now proven
with data-driven analytics, as a key attribute for being a company founder.
355
Most founders come from adjacent sector firms.
For Silicon Valley, the data analytics also reveal that founders come primarily from
seventy-one academic institutions and ten predominant industry segments, and become
founders from a top core of eighteen specific job titles.
From a university perspective, the top suppliers of Internet segment Silicon Valley
founders are as follows, with part of the long-tail shown as well:
References
356
Archibugi, Daniele, Mario Denni, and Andrea Filippetti, The Technological Capabilities of Nations: The
State of the Art of Synthetic Indicators, Technological Forecasting and Social Change 76, no. 7 (2009):
917-931, doi:10.1016/j.techfore.2009.01.002.
Bell, Geoffrey G., Clusters, Networks, and Firm Innovativeness. Strategic Management Journal 26, no. 3
(2005): 287-295, doi:10.1002/smj.448.
Bresnahan, Timothy, Alfonso Gambardella, and Annalee Saxenian, Old Economy Inputs for New
Economy Outcomes: Cluster Formation in the New Silicon Valleys, Industrial and Corporate Change
10, no. 4 (2001): 835.
Delgado, Mercedes, Christian Ketels, Michael E. Porter, and Scott Stern, The Determinants of National
Competitiveness, National Bureau of Economic Research (2012). Retrieved from
http://www.frdelpino.es/wp-content/uploads/2012/11/DKPS_w18249.pdf
Demaine, Erik D., Dotan Emanuel, Amos Fiat, and Nicole Immorlica, Correlation Clustering in General
Weighted Graphs, Theoretical Computer Science 361, nos. 23 (2006): 172-187,
doi:10.1016/j.tcs.2006.05.008.
Drucker, Peter F., The Discipline of Innovation, Harvard Business Review 76, no. 6 (1998): 149-157.
Feldman, Maryann. P., and Richard Florida, The Geographic Sources of Innovation: Technological
Infrastructure and Product Innovation in the United States, Annals of the Association of American
Geographers 84, no. 2 (1994): 210-229, doi:10.1111/j.1467-8306.1994.tb01735.x
Ferrary, Michel, and Mark Granovetter, The Role of Venture Capital Firms in Silicon Valleys Complex
Innovation Network, Economy and Society 38, no. 2 (2009): 326-359,
doi:http://dx.doi.org/10.1080/03085140902786827
Forrester, Jay W., System Dynamics Society., Sloan School of Management., and System Dynamics Group.
MIT System Dynamics Group Literature Collection, 2003.
Griffith, Terri L., Patrick J. Yam, and Suresh Subramaniam, Silicon Valleys One-Hour Distance Rule
and Managing Return on Location, Venture Capital 9, no. 2 (2007): 85-106,
doi:10.1080/13691060601076202.
Harple, Daniel L., Toward a Network Graph-Based Innovation Cluster Density Index, Massachusetts
Institute of Technology, Sloan School of Management, 2013.
Porter, Michael. E., Mercedes Delgado, Christian Ketels, and Scott Stern, Moving to a New Global
Competitiveness Index, Global Competitiveness Report 2008-2009 (2008): 43-63.
357
Powell, Walter (Woody) W., Kelley A. Packalen, and Kjersten Bunker Whittington, Organizational and
Institutional Genesis: The Emergence of High-Tech Clusters in the Life Sciences, SSRN Scholarly Paper
No. ID 1416306, Rochester, New York: Social Science Research Network (2010). Retrieved from
http://papers.ssrn.com/abstract=1416306.
Roberts, Edward B., and Charles E. Eesley, Entrepreneurial Impact: The Role of MIT An Updated
Report, Foundations and Trends in Entrepreneurship 7, nos. 1-2 (2011): 1-149.
358
359
20. Five Things You Should Measure with Digital Marketing Analytics by Judah
Phillips
Watching the growth of digital analytics over the last several years has been both
exciting and disturbing.
Its been exciting because what was a once niche-activity has evolved into a serious,
business-focused enterprise activity.
Disturbing, because many people & organizations want to compete on analytics, but are
not doing the right things or adopting the right thinking about analytics. Thats why I
wrote my first book Building a Digital Analytics Organization.
Ive run into organizations that dont know how to effectively create, participate,
manage or lead analysts and often believe that data science or the latest technology
will save the day, not the team of people with different skill sets working cross-
functionally to make systematic improvements.
360
Without the right team, a deep understanding of the business, and the ability to do
analysis, testing and optimization data science isnt going to take you far.
By focusing on The Analytics Value Chain, companies can understand how to control,
operate, and maximize benefit from the all of the technical requirements and functions
needed to successfully do analytics.
361
This mindset, or something like it, is necessary to help key players within a business
understand how data science & conversion rate optimization play a role to business
growth.
15% Of CMOs Dont Measure Marketing ROI, 27% Rely On Manager Judgements
Though it may seem obvious, according to the CMO Survey, which collected responses
from Fortune 1000, Forbes Top 200, and other high ranking Marketers, found that the
state of analytics even now in 2014, is quite dismal.
362
1 in 4 companies are depending on manager judgements to make decisions.
25% of all marketing teams cant even figure out how to add data into
consideration during the decision making process.
Analytics brings a level of transparency and accountability to business leaders that can
be uncomfortable or unexpected because data can show the truth.
If every business were as data-driven as they claimed & held actually held
accountable to increasing quantifiable metrics a lot of people would be out of a job.
But Ive been around long enough to know that data can be incorrectly collected,
reported, and analyzed.
363
People can lose the faith in analytics and analytics teams, which causes decisions to be
made from the gut. After all, I always say that unless data supports commonly held
beliefs or shows positive performance, it will be questioned and challenged.
Its easy for marketers to dismiss data when theres a perception of inaccuracy.
Several common subjects in analytics are easy to not fully explore or elucidate with
data, or to totally get so wrong, the data is just crap.
Couple this with technologies that are incompatible with each other, Experts who
preach analytics without wide practical experience, analysts who are learning on the
job, a lack of data standards or governance & vendors who dont dont take the time to
understand how you use their tools, and it all becomes a real mess.
As a result, there are 5 common areas within digital analytics that are are not using the
correct model & are being measured inaccurately or worse, not at all:
Prospects are unqualified leads. These are your visitors or traffic, and are, for the
most part, anonymous.
Leads on the other hand are qualified prospects that have voluntarily given some
identifying information that makes them known in some way.
If youve ever heard of lead generation site, then you get it. People visit your site, fill
out a form to request something or join something (like the SmartCurrent newsletter)
and then they are leads.
364
Once a company has a lead, they may execute a communication sequence to compel
the lead to buy something or give more information incrementally.
The key in understanding leads with digital analytics often starts by using campaign
codes.
365
These codes are
query-string
parameters
assigned to
particular
inbound
advertising
campaigns in
way specific to
each analytics
tool.
Omnitures
standard
campaign coding
conventions are
different from
Google
Analytics, which
are different
from IBMs and
Webtrends.
No two tools
track exactly the same way.
366
Most tools can be configured to recognize campaign codes that fall outside of the
default markup/format. Campaign codes can also be set outside of the query string in
certain tools (cookies, hidden form field, etc).
Taken to the next level, you may want to collect specific clicks (i.e. events) that occur
before the prospect reaches a lead form.
But in many cases, this type of data is not determined ahead of time nor architected into
an analytics solution. Or if it is, it may not be aligned for answering the business
questions the company may have about the lead.
What to do? You need to define a campaign coding naming convention and make sure
it is practiced consistently and always. Bruce Clay gives a useful review for Google
Analytics on his blog.
You need to determine the types of information about the pages and flows you want to
use to better understand the prospect and lead. You want to define the specific events,
367
subordinate to the page view, about the pages and flows that you want to track;
implement, test, and analyze them in the context of the lead generation flow.
You want to tag all the elements of your forms and see where people drop off and
where they complete the form.
You want to collect the full, relevant set of information that you can segment and pivot
against to understand the prospect and lead flow in way that can drive hypothesis
testing to improve it.
For sites that sell something, its completely obvious to say that customers are
important and that understanding who your customers are and what they think is
very important.
But I once met a CEO who told me I dont care what my customers do, as long as they
buy.
368
In that case, I found an analytics implementation that was largely devoid of real
customer data or any useful instrumentation for advanced data collection and
segmentation.
Customer data is often in the domain of CRM systems, like SAP or Salesforce.com, or in
the absence of Customer Relationship Management platform, the customer database or
data warehouse.
For publishers, it may in something called a Data Management Platform (DMP). You
may have heard that your analytics tool cant for one reason or another support
having Personally Identifiable Information or other customer data.
369
No, you cant do certain things with Google Analytics, but the reality of being
sophisticated with analytics is that you arent likely to be using Google Analytics on
customer data; you may have another tool.
In other tools, its completely possible to understand the customer in the context of their
digital behavior by going beyond the page tag and into BI or perhaps a Digital Data
Distribution Platform (D3P) in the cloud.
Adding custom dimensions or roundtripping data from your data warehouse, DMP,
CRM, and so on can have inordinate levels of applicability for better understanding
your customers and their behavior.
In my experience, analysts were rather excited when they were able to use
DoubleClicks gender, age, affinity, and interest group information in Google Analytics.
370
But that type of information, at the detailed level, has been brought into enterprise
analytics implementations for years. You just had to spend a lot of money on tools (you
may still need to) and have the right team who knew how to do it technically in a way
meaningful to the business.
371
Then figure out the right definitions for it, the right place where it should live, and the
best way to join it together and allow analysts and business users to access it.
Like understanding the sources of traffic using campaign codes, you want to do
something similar for advertising.
372
Direct response means you expect people to do something when they arrive at your site.
You expect them to take an action.
Whereas branding is a bit softer and meant to promote an increase in the awareness or
equity of a brand or product (without the required response).
Many ads, like banner ads, or pre-rolls, interrupt the audience; these are message-
centric ads. Other ads, where something is given away for free (perhaps a membership,
an incentive, an offer) are value-centric.
This type of message-based vs value-centric designation may sound odd, but Netflix
putting a banner ad on an affiliate site (message-centric) is a very different ad from
creating high-quality content (like H
ouse
of Cards) and advertising that custom programming via owned and earned
channels(value-centric).
Then, of course, there are the simple campaign parameters, such as the type of
campaign (banner, CPC, blog, video, and so on) and specificities like the ad group or
keyword.
These parameters and the nature of advertising the can be expressed in analytics can be
highly-specific to the company. They could even include the device target (mobile vs
tablet vs site) or the size and format of the ad.
373
You may want to bring this data in by integrating with your Ad Server, social network,
DMP, or D3P. Regardless, it is important to enhance the data in your analytics to
include more information that can help you better understanding the origins and types
of advertising to better understand conversion.
Determine how to pass the data into your analytics via tagging, your Tag Manager,
Business Intelligence technology, or other ways to integrate the data.
374
But the traditional idea of a conversion funnel is half-baked. Your anonymous or mostly
anonymous visitors during a single visit go through a discrete series of steps before
they transition to point where value is created (i.e. the conversion).
Isnt it a bit daft to think of conversion as a series of linear steps that occur during a
single visit?
Does the funnel begin on the clickthrough to the site? After the product has been added
to the cart? When the Shopping Cart flow starts as a guest or logged-in member?
375
The answer will vary from business to business, but understand its these nuances that
can make or break the usefulness of conversion tracking and your conversion funnel.
Taken to the next level, is the funnel even an accurate metaphor for expressing user
behavior?
Sure the funnel is a clever marketing term that makes it easy to understand. But it is
true that a wider funnel isnt just making the steps in your conversion more persuasive,
it actually means considering what happens in the different layers/modals, events, and
in the pre and post-behavior before and after the purchase, when the purchase is
abandoned, and even when completed with latency during a different visit.
First, you need to define what a conversion actually is, socialize the definition of the
steps to conversion within your organization, and ensure your data collection is
appropriately instrumented to collect the specific steps you define.
That will mean tagging for goals and events and it will mean figuring out how to
determine when someone comes back and purchases later and when revenue is
captured after abandonment.
In fact, its probably more apt to reconsider the funnel entirely. I sometimes call it
The Tumbler where a visitor is Seeking, then Shopping, then Sharing.
376
Seeking
Shopping
Sharing
377
What happens after the purchase, which is not necessarily part of the purchase, but in
terms of customer satisfaction, likelihood to recommend, and other measures of brand
health can have significant impact on your conversion rates.
Understanding
the data behind
how the existing
share to
conversion ratio
can help you to
decide whether
moving into
arenas like
referral
marketing will
be right for your
business, and
give you a better
data-informed
predictions on
what the success
of such a
program might
be.
Consider the tumble through the journey on the path to purchase what happens
before and after the shopping & how they arrived at your site in the first place.
It can be a complicated type of analysis. You can spend many months preparing data,
hundreds of thousands of dollars on vendor software, and significant cerebral power
and data science trying to attribute what caused people to buy.
378
Or you can tag and group your campaigns in certain tools, like Google Analytics, and
get a free way to understand conversion.
Youve heard of Last Click or Last Interaction and First Click or First Interaction
and maybe even explored the nuances of Last Adwords or Last Non-Direct
attribution.
Perhaps youve considered Equal attribution where all touches are given equal credit
in assisting the conversion. Or youve thought about Time Decay where all customer
touches that occurred nearest to the conversion are given an increasing credit as an
assist to conversion.
379
Or maybe you havent and are still wondering why your Adwords (First Interaction) is
different than your Paid Search attribution in Google Analytics (Last Interaction).
Adding to the challenge of attribution are the actual financial measures behind the
conversion. That is the cost and revenue data that enable you to look at things like
ROAS (return on advertising spend) vs ROI (which can include margin).
First, like all things with analytics, you need to consider your business goals, purchase
cycles, the duration before a conversion, and the types of campaigns you are running.
Then you need to take a look at the different attribution models to identify what they
calculating and telling you about the performance of your campaign mix.
In short, you cant just assume credit for a source from looking at one model or not
thinking deeply about attribution and how to best express credit for your conversion
without just accepting the default view.
You need to consider the impact of cost, revenue, and margin on your attribution
modeling.
Conclusion
So there you have it 5 areas where you may be measuring perfectly or you may be
measuring incorrectly.
Its hard to say whats right or whats wrong without considering your analytics as part
of The Analytics Value Chain.
The value chain where you are aligning your business goals and requirements, data
collection, data governance and definitions with your reporting, analysis, optimizations,
predictions, and automations in order to best understand your prospects, leads,
customers, conversions, and attribution.
*This post also appeared on ConversionXL blog and was edited by Tommy Walker.
380
ABOUT THE CURATORS
Vishal Kumar
Deeksha Joshi
Data driven leader with over 12 years of expertise in leading strategy, innovation,
commercialization, transformation and management consulting
Deeksha is a thought leader and author of a book on innovation- Data Driven
Innovation- a Primer. She has experience in working with senior leadership, business
development, leading teams, strategy and planning across various industries including
healthcare, financial services, insurance and retail.
She has a MBA from Darden School of Business, Virginia and has a bachelor in
Engineering from Netaji Subhas Institute of Technology, New Delhi.
381