Principles of Data Wrangling Practical Techniques For Data Preparation 1st Edition Tye Rattenbury

Download and Read online, DOWNLOAD EBOOK, [PDF EBOOK EPUB ], Ebooks
download, Read Ebook EPUB/KINDE, Download Book Format PDF
Principles of Data Wrangling Practical Techniques

for Data Preparation 1st Edition Tye Rattenbury
OR CLICK LINK
https://textbookfull.com/product/principles-of-
data-wrangling-practical-techniques-for-data-
preparation-1st-edition-tye-rattenbury/
Read with Our Free App Audiobook Free Format PFD EBook, Ebooks dowload PDF
with Andible trial, Real book, online, KINDLE , Download[PDF] and Read and Read
Read book Format PDF Ebook, Dowload online, Read book Format PDF Ebook,
[PDF] and Real ONLINE Dowload [PDF] and Real ONLINE
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Data Mining and Data Warehousing: Principles and

Practical Techniques 1st Edition Parteek Bhatia
https://textbookfull.com/product/data-mining-and-data-
warehousing-principles-and-practical-techniques-1st-edition-
parteek-bhatia/
Mastering the SAS DS2 Procedure Advanced Data Wrangling

Techniques Second Edition Mark Jordan
https://textbookfull.com/product/mastering-the-sas-ds2-procedure-
advanced-data-wrangling-techniques-second-edition-mark-jordan/
Python for Data Analysis Data Wrangling with Pandas

NumPy and IPython Wes Mckinney
https://textbookfull.com/product/python-for-data-analysis-data-
wrangling-with-pandas-numpy-and-ipython-wes-mckinney/
Practical Data Science with SAP Machine Learning

Techniques for Enterprise Data 1st Edition Greg Foss
https://textbookfull.com/product/practical-data-science-with-sap-
machine-learning-techniques-for-enterprise-data-1st-edition-greg-
foss/
Data Wrangling with JavaScript 1st Edition Ashley Davis
https://textbookfull.com/product/data-wrangling-with-
javascript-1st-edition-ashley-davis/
Python for Data Analysis Data Wrangling with pandas

NumPy and Jupyter 3rd Edition Wes Mckinney
https://textbookfull.com/product/python-for-data-analysis-data-
wrangling-with-pandas-numpy-and-jupyter-3rd-edition-wes-mckinney/
Feature engineering for machine learning principles and

techniques for data scientists First Edition Casari
https://textbookfull.com/product/feature-engineering-for-machine-
learning-principles-and-techniques-for-data-scientists-first-
edition-casari/
Python Data Analysis: Perform data collection, data

processing, wrangling, visualization, and model
building using Python 3rd Edition Avinash Navlani
https://textbookfull.com/product/python-data-analysis-perform-
data-collection-data-processing-wrangling-visualization-and-
model-building-using-python-3rd-edition-avinash-navlani/
Principles of Data Science Learn the techniques and

math you need to start making sense of your data 1st
Edition Sinan Ozdemir
https://textbookfull.com/product/principles-of-data-science-
learn-the-techniques-and-math-you-need-to-start-making-sense-of-
your-data-1st-edition-sinan-ozdemir/
Principles of Data Wrangling
Practical Techniques for Data Preparation
Tye Rattenbury, Joseph M. Hellerstein, Jeffrey

Heer, Sean Kandel, and Connor Carreras
Principles of Data Wrangling
by Tye Rattenbury, Joseph M. Hellerstein, Jeffrey Heer, Sean Kandel,
and Connor Carreras
Copyright © 2017 Trifacta, Inc. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles
(http://oreilly.com/safari). For more information, contact our
corporate/institutional sales department: 800-998-9938 or
corporate@oreilly.com.
Editor: Shannon Cutt
Production Editor: Kristen Brown
Copyeditor: Bob Russell, Octal Publishing, Inc.
Proofreader: Christina Edwards
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
May 2017: First Edition
Revision History for the First Edition

2017-04-25: First Release
2017-06-27: Second Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.
Principles of Data Wrangling, the cover image, and related trade
dress are trademarks of O’Reilly Media, Inc.
While the publisher and the authors have used good faith efforts to
ensure that the information and instructions contained in this work
are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for
damages resulting from the use of or reliance on this work. Use of
the information and instructions contained in this work is at your
own risk. If any code samples or other technology this work contains
or describes is subject to open source licenses or the intellectual
property rights of others, it is your responsibility to ensure that your
use thereof complies with such licenses and/or rights.
978-1-491-93892-8
[LSI]
Foreword
Through the last decades of the twentieth century and into the
twenty-first, data was largely a medium for bottom-line accounting:
making sure that the books were balanced, the rules were followed,
and the right numbers could be rolled up for executive decision-
making. It was an era focused on a select group of IT staff
engineering the “golden master” of organizational data; an era in
which mantras like “garbage in, garbage out” captured the attitude
that only carefully engineered data was useful.
Attitudes toward data have changed radically in the past decade, as
new people, processes, and technologies have come forward to
define the hallmarks of a data-driven organization. In this context,
data is a medium for top-line value generation, providing evidence
and content for the design of new products, new processes, and
evermore efficient operation. Today’s data-driven organizations have
analysts working broadly across departments to find methods to use
data creatively. It is an era in which new mantras like “extracting
signal from the noise” capture a different attitude of agile
experimentation and exploitation of large, diverse sources of data.
Of course, accounting still needs to get done in the twenty-first
century, and the need remains to curate select datasets. But the
data sources and processes for accountancy are relatively small and
slow to change. The data that drives creative and exploratory
analyses represents an (exponentially!) growing fraction of the data
in most organizations, driving widespread rethinking of processes for
data and computing—including the way that IT organizations
approach their traditional tasks.
The phrase data wrangling, born in the modern context of agile
analytics, is meant to describe the lion’s share of the time people
spend working with data. There is a common misperception that
data analysis is mostly a process of running statistical algorithms on
high-performance data engines. In practice, this is just the final step
of a longer and more complex process; 50 to 80 percent of an
analyst’s time is spent wrangling data to get it to the point at which
this kind of analysis is possible. Not only does data wrangling
consume most of an analyst’s workday, it also represents much of
the analyst’s professional process: it captures activities like
understanding what data is available; choosing what data to use and
at what level of detail; understanding how to meaningfully combine
multiple sources of data; and deciding how to distill the results to a
size and shape that can drive downstream analysis. These activities
represent the hard work that goes into both traditional data
“curation” and modern data analysis. And in the context of agile
analytics, these activities also capture the creative and scientific
intuition of the analyst, which can dictate different decisions for each
use case and data source.
We have been working on these issues with data-centric folks of
various stripes—from the IT professionals who fuel data
infrastructure in large organizations, to professional data analysts, to
data-savvy “enthusiasts” in roles from marketing to journalism to
science and social causes. Much is changing across the board here.
This book is our effort to wrangle the lessons we have learned in this
context into a coherent overview, with a specific focus on the more
recent and quickly growing agile analytic processes in data-driven
organizations. Hopefully, some of these lessons will help to clarify
the importance—and yes, the satisfaction—of data wrangling done
well.
Chapter 1. Introduction
Let’s begin with the most important question: why should you read
this book? The answer is simple: you want more value from your
data. To put a little more meat on that statement, our objective in
writing this book is to help the variety of people who manage the
analysis or application of data in their organizations. The data might
or might not be “yours,” in the strict sense of ownership. But the
pains in extracting value from this data are.
We’re focused on two kinds of readers. First are people who manage
the analysis and application of data indirectly—the managers of
teams or directors of data projects. Second are people who work
with data directly—the analysts, engineers, architects, statisticians,
and scientists.
If you’re reading this book, you’re interested in extracting value from
data. We can categorize this value into two types along a temporal
dimension: near-term value and long-term value. In the near term,
you likely have a sizable list of questions that you want to answer
using your data. Some of these questions might be vague; for
example, “Are people really shifting toward interacting with us
through their mobile devices?” Other questions might be more
specific: “When will our customers’ interactions primarily originate
from mobile devices instead of from desktops or laptops?”
What is stopping you from answering these questions? The most
common answer we hear is “time.” You know the questions, you
know how to answer them, but you just don’t have enough hours in
the day to wrangle your data into the right form.
Beyond the list of known questions related to the near-term value of
your data is the optimism that your data has greater potential long-
term value. Can you use it to forecast important seasonal changes?
What about risks in your supply chain due to weather or geopolitical
shifts? Can you understand how the move to mobile is affecting your
customers’ purchasing patterns? Organizations generally hire data
scientists to take on these longer-term, exploratory analyses. But
even if you have the requisite skills to tackle these kinds of analyses,
you might still struggle to be allocated sufficient time and resources.
After all, exploratory analytics projects can take months, and often
contain a nontrivial risk of producing primarily negative or
ambiguous results.
As we’ve seen, the primary impediment to realizing both the short-
term and long-term value of your data is time: your limited time and
your organization’s limited time. In this book, we describe how
improving your data wrangling efforts can create the time required
to get more near-term and long-term value from your data. In
Chapters 1-3, we describe a workflow framework that links activities
focused on both kinds of value, and explain how data wrangling
factors into those activities and into the overall workflow framework.
We introduce the basic building blocks for a data wrangling project:
data flow, data wrangling activities, roles, and responsibilities. These
are all elements that you will want to consider, at a high level, when
embarking on a project that involves data wrangling. Our goal is to
provide some helpful guidance and tips on how to coordinate your
data wrangling efforts, both across multiple projects by making sure
your wrangling efforts are constructive as opposed to redundant or
conflicting, and within a single project by taking advantage of some
standard language and operations to increase productivity and
consistency.
There’s more to effective data wrangling than just clearly defined
workflows and processes; to most effectively wrangle your data, you
should also understand which transformation actions constitute data
wrangling, and, most important, how you can use those
transformations to produce the best datasets for your analytic
activities.
Those nitty-gritty transformations constitute our discussion in
Chapters 4-7. You can think of those chapters as a rough “how-to”
guide for data wrangling. That said, we do not intend this book to
provide a comprehensive tutorial on all possible data wrangling
methods. Instead, we want to give you a collection of techniques
that you can use when moving through the stages of the data
workflow framework.
As we introduce each of the key transformation and profiling
activities that comprise data wrangling, we will walk through a
theoretical data project involving a publicly available dataset
containing US campaign finance information. You can walk through
the project along with us in your data wrangling tool of choice.
Finally, we end by discussing roles and responsibilities in a data
wrangling project in Chapter 8, and exploring a selection of data
wrangling tools in Chapter 9.
Throughout the book, we ground our discussion in example data,
transformations of that data, and various visual and statistical views
of that data. Along those lines, we open with a story about
Facebook.
Magic Thresholds, PYMK, and User Growth at

Facebook
Growth is about tapping and delivering value to the yet unserved
part of your market. Facebook stands as a quintessential example of
how to drive growth. Toward the end of 2015, Facebook reported
more than one billion daily active users with a year-over-year growth
around 17 percent.1 There are, of course, many factors that have
contributed to this growth. We’ll focus here on a series of data-
driven insights that armed Facebook with strategies to deliver robust
growth, year over year over year.
Growth is ultimately about increasing the number of actively
engaged users and customers. It follows a simple equation:2
active users = new users + returning users + resurrected users
A critical aspect of growth is bringing new users and customers to
your product or service. But just as critical is delivering value to new
users so that they stay engaged. Ideally, users are “returning” (i.e.,
active from one period to the next). However, depending on how you
are tracking engagement, you might see blips of inactivity followed
by reengagement (placing these users in the “resurrected” group in
the aforementioned equation). We’ll focus on this second critical
aspect of growth—delivering value to new users quickly so that they
are motivated to stay engaged.
As Alex Schultz, vice president of growth at Facebook, points out,
the primary value for Facebook users revolves around connecting
people to the content from their friends.3 Obviously, for this to work,
users need friends on Facebook. But is this the only thing that
matters—any content from any friend? Common sense would tell
you that this can’t be true, and that people engage with some
content more than other content. So here we have a set of near-
term questions to answer:
How many friends does a new Facebook user need to be X-

percent likely to return as a user in 30 days? In 60 days? In 180
days?
For new users, what characteristics of their friends stand out to
differentiate between new users who churn (leave the platform
and don’t come back) versus those who remain active?
Do the preceding findings change by user cohort (groups of users
that initially joined Facebook at around the same time)?
Answering questions like these is the purview of the Growth and
Analytics team at Facebook. Interestingly, the team found a magic
threshold that captured a key predictor of long-term user
engagement: new users should connect to 10 friends within 14 days.
Magic thresholds have two key characteristics: first, they should
correspond to a concise Key Performance Indicator (KPI) target that
predicts (and if you are lucky, drives) the impact you want; and
second, they should be actionable. KPI targets are standard across
industries and departments, but what sets a magic threshold apart is
that it exposes the core dynamic of the system and provides a lever
for achieving a desired outcome. In the case of Facebook,
connecting to friends quickly is a critical driver of value for new
users, and if Facebook can find ways to reach that threshold for
more new users, more new users should stay engaged over the long
term.
This magic threshold has the advantage of encoding the core value
proposition of Facebook: users connecting to their friends. It also
has the advantage of coordinating a number of product decisions to
help satisfy this threshold for as many new users as possible.
So, how does Facebook find friends for new users? There are simple,
manual mechanisms that allow new users to import their email
contact lists (which Facebook then triangulates with its known list of
users). This provides short-term value. Facebook also utilizes more
sophisticated mechanisms to link users to friends. We consider these
mechanisms to fall into the realm of long-term value, in part because
the depth of analyses and experimentation that are required to
robustly expose this value take months to years. But more
importantly, these in-depth analyses give rise to data-driven services
that automatically perform the desired operations.
In Facebook’s case, one of the core systems used to drive growth,
by helping new users connect to friends within Facebook, is known
as PYMK, or People You May Know. PYMK is a recommender system,
not unlike Amazon’s product recommendation system or Netflix’s
movie/show recommendation system. It employs a well-known and
often-used user experience rule: recognition is better than recall. In
other words, it’s easier and more enjoyable for users to say “yes” or
“no” to a series of suggestions than it is for them to generate the
content of the suggestions through search or a menu-driven builder
experience.
PYMK uses a number of features about the new users and, more
important, about the first few friends to whom they have connected.
In its most basic form, you can think of PYMK as collecting all the
friends of a user’s friends to whom they are not currently connected.
Then, based on metrics like the number of mutual friends, age
similarity, education similarity, and so on, it ranks this list and
presents it back to the user as recommendations.
So, with a little bootstrapping from an important contact list or a few
manual friend searches, new users on Facebook begin receiving
recommendations on who to connect with. The PYMK system that
enables these connections has been critical to Facebook’s continuous
growth.
But the story becomes even more interesting. After some long-
running analyses and experimentation, Facebook found that a more
effective use of PYMK for user growth was not to focus on
recommendations for new users (because bootstrapping is difficult
and the early recommendations can come with low-confidence
scores), but rather to focus on recommendations to heavy, long-time
users of Facebook with vast and diverse connections. Specifically, the
key is to recommend new users to the heavy Facebook users. This
primes a new user with all sorts of interesting content and the friend
network of the heavy user can provide better estimates on friend
recommendations directed to the new user.
Although certainly unique in many ways, Facebook’s use of data
stands as a repeatable process that many other organizations can
follow. Starting with a clear motivation—driving user growth—a
number of explicit, near-term questions can provide critical insights
to improve the business. Over the long term, these insights can
blossom into data services that automate and optimize the earlier
insights for deeper and additional value.
In Chapter 2, we describe our workflow framework that links near-
term and long-term value from data with the variety of activities
involved in working with data.
https://s21.q4cdn.com/399680738/files/doc_financials/annual_report
s/2015-Annual-Report.pdf
2https://medium.com/swlh/diligence-at-social-capital-part-1-
accounting-for-user-growth-4a8a449fddfc#.w7lptg3n4
3 https://blog.kissmetrics.com/alex-schultz-growth/
Chapter 2. A Data Workflow
Framework
In this chapter, we present a framework for working with data. Our

goal is to cover the most common sequences of actions that people
take as they move through the process of accessing, transforming,
and using their data. We’ll begin at the end of this process, and
discuss the value you will get from your data.
In the introduction, we talked about near-term and long-term value.
Another dimension of value to consider is how that value will be
delivered into your organization. Will value be delivered directly,
through systems that can take automated actions based on data as
it is processed? Or will value be delivered indirectly, by empowering
people in your organization to take a different course of action than
they otherwise would have?
Indirect value
Data provides value to your organization by influencing people’s
decisions or inspiring changes in processes. Example: risk
modeling in the insurance industry.
Direct value
Data provides value to your organization by feeding automated
systems. Example: Netflix’s recommendation system.
Indirect value from data has a long tradition. Entire professions are
built on it: accounting, risk modeling in insurance, experimental
design in medical research, and intelligence analytics. On a smaller
scale, you might have used data to generate reports or interactive
visualizations. These reports and visualizations both use data to
deliver indirect value. How? When others view your report or
visualization, they incorporate the presented information into their
understanding of the world and then use their updated
understanding to improve their actions. In other words, the data
shown in your reports and visualizations indirectly influences other
people’s decisions. Most of the near-term, known potential value in
your data will be delivered indirectly.
Direct value from data involves handing decisions to data-driven
systems for speed, accuracy, or customization. The most common
example of this involves automatic delivery and routing of resources.
In the world of high-frequency trading and modern finance, this
resource is primarily money. In some industries, like consumer
packaged goods (think Walmart or Amazon), physical goods are
routed automatically. A close cousin to routing physical goods is
routing virtual ones: digital media companies like Netflix and
Comcast use automated pipelines to optimize the delivery of digital
content to their customers. At a smaller scale, systems like antilock
brakes in cars use data from sensors to route energy to different
wheels. Modern testing systems, like the GRE graduate school
entrance exam, now dynamically sequence questions based on the
tester’s evolving performance. In all of these examples, a significant
number of operational decisions are directly controlled by data-
driven systems without any human input.
How Data Flows During and Across Projects

Deriving indirect, human-mediated value from your data is a
prerequisite to deriving direct, automated value. At the outset,
human oversight is required to discover what is “in” your data and to
assess whether the quality of your data is sufficiently high to use it
in direct and automated ways. You can’t send data blindly into an
automated system and expect valuable results. Reports must be
authored and digested to understand the wider potential of your
data. As that wider potential comes into focus, automated systems
can be designed to use the data directly.
This is the natural progression of data projects: from near-term
answering of known questions, to longer-term analyses that assess
the core quality and potential applications of a dataset, and finally to
production systems that use data in an automated way. Underlying
this progression is the movement of data through three main data
stages: raw, refined, and production. Table 2-1 provides an overview
of this progression. For each stage, we list the primary objectives.
Table 2-1. Data moves through stages

Data Stage
Raw Refined Production
Primary
Objectives Ingest data Create canonical Create production-
data for quality data
Data
widespread
discovery Build regular
consumption
and reporting and
metadata Conduct automated data
creation analyses, products/services
modeling, and
forecasting
In the raw stage, the primary goal is to discover the data. When
examining raw data, you ask questions aimed at understanding what
your data looks like. For example:
What kinds of records are in the data?

How are the record fields encoded?
How does the data relate to your organization, to the kinds of
operations you have, and to the other data you are already using?
Armed with an understanding of the data, you can then refine the
data for deeper exploration by removing unusable parts of the data,
reshaping poorly formatted elements, and establishing relationships
between multiple datasets. Assessing potential data quality issues is
also frequently a concern during the refined stage, because quality
issues might negatively affect any automated use of the data
downstream.
Finally, after you understand the data’s quality and potential
applications in automated systems, you can move the data to the
production stage. At this point, production-quality data can feed
automated products and services, or enter previously established
pipelines that drive regular reporting and analytics activities.
A minority of data projects will end in the raw or production stages.
The majority will end in the refined stage. Projects ending in the
refined stage will add indirect value by delivering insights and
models that drive better decisions. In some cases, these projects
might last multiple years. Google’s Project Oxygen is a great
example of a project that ended in the refined stage.1 Realizing that
managing people is a critical skill for a successful organization,
Google kicked off a multiyear study to assess the characteristics of a
good manager and then test how effective they could be at teaching
those characteristics. The results of the study indireclty influenced
employee behavior, but the study data itself was not incorporated
into a production pipeline.
The hand-off between IT shared services organizations and lines of
business traditionally occurs in the refined stage. In such an
environment, IT is responsible for Extract-Transform-Load (ETL)
operations. ETL moves data through the three data stages in a
centrally controlled manner. Lines of business own the data analysis
process, including everything from reporting and ad hoc research
tasks, to advanced modeling and forecasting, to data-driven
operational changes. This division of concerns and responsibilities
has two intended benefits: basic data governance due to centralized
data processing, and efficiency gains due to IT engineers reusing
broadly useful data transformations.
However, in practice, the perceived benefits of centrally transforming
data are often eclipsed by the reality of organizational inefficiencies
and bottlenecks. Most of these bottlenecks arise from line-of-
business analysts being dependent upon IT. In the age of agile
analytics and data-driven services, there is increasing pressure to
speed up the extraction of value from your data. Unsurprisingly, the
best plan of attack involves identifying and removing bottlenecks.
In our experience, there are two primary bottlenecks. The first
bottleneck is the time it takes to wrangle your data. Even when you
start from refined data, there are often nontrivial transformations
required to prepare your data for analysis. These transformations
can include removing unnecessary records, joining in additional
information, aggregating data, or pivoting datasets. We will discuss
each of these common transformation actions in more detail in later
chapters.
The second bottleneck is the simple capacity mismatch that arises
when a large pool of analysts relies on a small pool of IT
professionals to prepare “refined” data for them. Removing this
bottleneck is more of an organizational challenge than anything else,
and it involves expanding the range of users who have access to raw
data and providing them with the requisite training and skills.
To help motivate these organizational changes, let’s step back and
consider the gross mechanics of successfully using data. The most
valuable uses of your data will be production uses that take the form
of automated reports or data-driven services and products. But
every production use of your data depends on hundreds or even
thousands of exploratory, ad hoc analyses. In other words, there is a
funnel of effort leading to direct, production value that begins with
exploratory analytics. And, as with any funnel, your conversation
rate will not be 100 percent. You’ll need as many people as possible
exploring your data and deriving insights in order to discover a
relatively small number of valuable applications of your data.
As Figure 2-1 demonstrates, a large number of raw data sources and
exploratory analyses are required to produce a single valuable
application of your data.
Figure 2-1. Data value funnel
When it comes to delivering production value from your data, there

are two critical points to consider. First, data can produce insights
that are not useful to you and your business. These insights might
not be actionable, or their potential impact might be too small to
warrant a change in existing processes. A good strategy for
mitigating this risk is to empower the people who know your
business priorities to explore your data. Second, your exploratory
analytics efforts should be as efficient as possible. This brings us
back to data wrangling. The faster you can wrangle data, the more
explorations of your data you can conduct, and the more analyses
you will be able to move into production. Ultimately, implementing
an effective data wrangling workflow can enable more business
analysts to explore a larger quantity of data at a faster pace.
Connecting Analytic Actions to Data

Movement: A Holistic Workflow Framework for
Data Projects
We began this chapter with a discussion of the direct and indirect
value delivered by data projects.
In this section, we expand our discussion of data stages into a
complete framework that captures the basic analytic actions involved
in most data projects. Figure 2-2 illustrates the overall framework
and will serve as our map through the rest of the book.
As Figure 2-2 illustrates, data moves through stages, from raw to
refined to production. Each stage has a small set of primary actions.
The actions come in two types: in the top three boxes in Figure 2-2
are actions whose results are the data itself, and in the bottom six
boxes are actions whose results are derived from or built on top of
the data inferences (e.g., insights, reports, products, or services).
For simplicity, the connecting links between actions in Figure 2-2 are
drawn in one direction. However, real data projects will often loop
back through actions, iterating toward better results.
Figure 2-2. A holistic workflow framework for data projects
Of course, many individuals and organizations will customize the

steps in this framework to fit their specific needs. Although we
describe each possible action in our framework, not every data
project will involve all of these actions. You might decide to define
variants of each action that are tailored to specific customers or
business objectives. You might also decide to create multiple
locations for refined data and multiple locations for production data.
We have seen this frequently at organizations where data security is
important, and different business units are not allowed to access
each other’s data. However, most organizations that we have worked
with follow the uncustomized version of this framework.
In the rest of this chapter, we’ll discuss the actions in Figure 2-2.
The discussion will move through the three data stages in order.
Raw Data Stage Actions: Ingest Data and

Create Metadata
There are three primary actions in the raw data stage: ingestion of
data, creation of generic metadata, and creation of propriety
metadata. We can separate these actions into two groups based on
their output, as shown in Figure 2-3. One group is focused on
outputting data—the two ingestion actions. The second group is
focused on outputting insights and information derived from the data
—the metadata creation actions.
Figure 2-3. Primary action and output actions in the raw data stage
Ingesting Known and Unknown Data

The process of ingesting data can vary widely in its complexity. At
the less complex end of the spectrum, many people receive their
data as files via channels like email, shared network folders, or FTP
websites. At the more complex end of the spectrum, modern open
source tools like Sqoop, Flume, and Kafka enable more granular and
real-time transferring of data, though at the cost of requiring
nontrivial software engineering to set up and maintain. Somewhere
in the middle of this spectrum are proprietary platforms like Alteryx,
Talend, and Informatica Cloud that support a variety of data transfer
and ingestion functionality, with an eye toward easing of
configuration and maintenance for nonengineers.
In traditional enterprise data warehouses, the ingestion process
involves some initial data transformation operations. These
transformations are primarily aimed at mapping inbound elements to
those elements’ standard representations in the data warehouse. For
example, you might be ingesting a comma-separated values (CSV)
file and need each field in that file to correspond to a particular
column in a relational data warehouse. After it is transformed to
match the syntax rules defined by the warehouse, the data is stored
in predefined locations. Often this involves appending newly arrived
data to related prior data. In some cases, appends can be simple,
literally just adding new records at the “end” of the dataset. In other
cases, when the incoming data contains edits to prior data as well as
new data, the append operation becomes more complicated. These
scenarios often require you to ingest new data into a separate
location, where more complex merging rules can be applied during
the refined data stage.
Some modern NoSQL databases like MongoDB or Cassandra support
less-rigid syntax constraints on incoming data while still supporting
many of the classic data access controls of more traditional
warehouses. Further along the spectrum (toward relaxed constraints
on incoming data) are basic storage infrastructures like HDFS and
Amazon S3 buckets. For most users, S3 and HDFS look and act like
regular filesystems. There are folders and files. You can add to,
modify, and move them around. And, if necessary, you can control
access on a per-file, per-user basis.
The primary benefit of modern distributed filesystems like HDFS and
S3 is that data ingestion can be as simple as copying files or storing
a stream of data into one or more files. In this environment, the
work to make this data usable and accessible is often deferred until
the data is transformed and moved to the refined data stage. This
style of data ingestion is often referred to as schema-on-read. In
schema-on-read ingestion, you do not need to construct or enforce a
usable data structure until you need to use the data. Traditional data
warehouses, in contrast, require schema-on-write, in which the data
must adhere to certain structural and syntactic constraints in order
to be ingested.
In other words, the two ends of the ingestion complexity spectrum
differ based on when the initial enforcement of data structure
happens. However, it is important to note that along this entire
spectrum of ingestion infrastructures, you will still require a separate
refined data stage. This is because refined data has been further
wrangled to align with foreseeable analyses.
Let’s consider an example data ingestion use case. It is common
practice for consumer packaged goods (CPG) retailers (e.g., Walmart
and Target) and manufacturers (e.g., Pepsico and General Mills) to
share data about their supply chains. This data enables better
forecasting, helping both sides to better manage inventory.
Depending on the size of the companies, data might be shared on a
daily, weekly, or monthly basis. The ingestion complexity comes from
the many-to-many partnerships in this ecosystem: retailers sell
products from many manufacturers, and manufacturers sell products
to many retailers. Each of these companies produces data in
different formats and conforms to different syntactic conventions.
For example, each company might refer to products by using their
own product IDs or product descriptions. Or some companies might
report their data at case or bundle levels instead of the individual
units that an end consumer would purchase. Retailers with strong
weekly patterns (e.g., much higher sales activity on weekends versus
weekdays) might report their overall sales activity on a weekly basis
instead of a monthly basis. Even retailers that report their data at
the same frequency might define the beginning and ending of each
period differently. Further complexity arises in retailer sales data
when consumers return purchased goods. Return transactions
require amendments to previously shared sales data, often going
back multiple weeks.
The ingestion processes for these CPG companies can range from
simple file transfers, which wait for the refined data stage to tackle
the potentially complex wrangling tasks required to sort out the
aforementioned difficulties, to more engineered ETL processes that
fix some of these difficulties as the data is ingested. In either case,
both retailers and manufacturers are interested in forecasting future
sales. Because these forecasts are regularly refreshed, and because
the historical sales data on which they are based can and is
amended, most large CPG companies work with supply chain data in
a time-versioned way. This means that a forecast for the first week
of January 2017 based on data received through August 31, 2016 is
kept separate and distinct from a forecast for the same first week of
January 2017 using data received through September 30, 2016.
In addition to storing data in time-versioned partitions, data from
different partners is often ingested into separate datasets. This
greatly simplifies the ingestion logic. After ingestion, as the data
moves into the refined stage, the separate partner datasets are
harmonized to a standard data format so that cross-partner analyses
can be efficiently conducted.
Creating Metadata
In most cases, the data that you are ingesting during the raw data
stage is known; that is, you know what you are going to get and
how to work with it. But what happens when your organization adds
a new data source? In other words, what do you do when your data
is partially or completely unknown? Ingesting unknown data triggers
two additional actions, both related to the creation of metadata. One
action is focused on understanding the characteristics of your data,
or describing your data. We refer to this action as generating generic
metadata. A second action is focused on using the characteristics of
your data to make a determination about your data’s value. This
action involves creating custom metadata.
Another random document with
no related content on Scribd:
day, deafness; the fever increased; urine the same. On the twentieth
and following days, much delirium. On the thirtieth, copious
hemorrhage from the nose, and became more collected; deafness
continued, but less; the fever diminished; on the following days,
frequent hemorrhages, at short intervals. About the sixtieth, the
hemorrhages ceased, but violent pain of the hip-joint, and increase
of fever. Not long afterwards, pains of all the inferior parts; it then
became a rule, that either the fever and deafness increased, or, if
these abated and were lightened, the pains of the inferior parts were
increased. About the eightieth day, all the complaints gave way,
without leaving any behind; for the urine was of a good color, and
had a copious sediment, while the delirium became less. About the
hundredth day, disorder of the bowels, with copious and bilious
evacuations, and these continued for a considerable time, and again
assumed the dysenteric form with pain; but relief of all the other
complaints. On the whole, the fevers went off, and the deafness
ceased. On the hundred and twentieth day, had a complete crisis.
Ardent fever.
Explanation of the characters. It is probable that the bilious
discharge brought about the recovery on the hundred and twentieth
day.[733]
Case X.—In Abdera, Nicodemus was seized with fever from
venery and drinking. At the commencement he was troubled with
nausea and cardialgia; thirsty, tongue was parched; urine thin and
dark. On the second day, the fever exacerbated; he was troubled
with rigors and nausea; had no sleep; vomited yellow bile; urine the
same; passed a quiet night, and slept. On the third, a general
remission; amelioration; but about sunset felt again somewhat
uncomfortable; passed an uneasy night. On the fourth, rigor, much
fever, general pains; urine thin, with substances floating in it; again a
quiet night. On the fifth, all the symptoms remained, but there was an
amelioration. On the sixth, some general pains; substances floating
in the urine; very incoherent. On the seventh, better. On the eighth,
all the other symptoms abated. On the tenth, and following days,
there were pains, but all less; in this case throughout, the paroxysms
and pains were greater on the even days. On the twentieth, the urine
white and thick, but when allowed to stand had no sediment; much
sweat; seemed to be free from fever; but again in the evening he
became hot, with the same pains, rigor, thirst, slightly incoherent. On
the twenty-fourth, urine copious, white, with an abundant sediment; a
copious and warm sweat all over; apyrexia; the fever came to its
crisis.
Explanation of the characters. It is probable that the cure was
owing to the bilious evacuations and the sweats.[734]
Case XI.—In Thasus, a woman, of a melancholic turn of mind,
from some accidental cause of sorrow, while still going about,
became affected with loss of sleep, aversion to food, and had thirst
and nausea. She lived near the Pylades, upon the Plain. On the first,
at the commencement of night, frights, much talking; despondency,
slight fever; in the morning, frequent spasms, and when they ceased,
she was incoherent and talked obscurely; pains frequent, great, and
continued. On the second, in the same state; had no sleep; fever
more acute. On the third, the spasms left her; but coma, and
disposition to sleep, and again awake, started up, and could not
contain herself; much incoherence; acute fever; on that night a
copious sweat all over; apyrexia, slept, quite collected; had a crisis.
About the third day, the urine black, thin, substances floating in it
generally round, did not fall to the bottom; about the crisis a copious
menstruation.[735]
Case XII.—In Larissa,[736] a young unmarried woman was seized
with a fever of the acute and ardent type; insomnolency, thirst;
tongue sooty and dry; urine of a good color, but thin. On the second,
in an uneasy state, did not sleep. On the third, alvine discharges
copious, watery, and greenish, and on the following days passed
such with relief. On the fourth, passed a small quantity of thin urine,
having substances floating towards its surface, which did not
subside; was delirious towards night. On the sixth, a great
hemorrhage from the nose; a chill, with a copious and hot sweat all
over; apyrexia, had a crisis. In the fever, and when it had passed the
crisis, the menses took place for the first time, for she was a young
woman. Throughout she was oppressed with nausea, and rigors;
redness of the face; pain of the eyes; heaviness of the head; she
had no relapse, but the fever came to a crisis. The pains were on the
even days.[737]
Case XIII.—Apollonius, in Abdera, bore up (under the fever?) for
some time, without betaking himself to bed. His viscera were
enlarged, and for a considerable time there was a constant pain
about the liver, and then he became affected with jaundice; he was
flatulent, and of a whitish complexion. Having eaten beef, and drunk
unseasonably, he became a little heated at first, and betook himself
to bed, and having used large quantities of milk, that of goats and
sheep, and both boiled and raw, with a bad diet otherwise, great
mischief was occasioned by all these things; for the fever was
exacerbated, and of the food taken scarcely any portion worth
mentioning was passed from the bowels; the urine was thin and
scanty; no sleep; troublesome meteorism; much thirst; disposition to
coma; painful swelling of the right hypochondrium; extremities
altogether coldish; slight incoherence, forgetfulness of everything he
said; he was beside himself. About the fourteenth day after he
betook himself to bed, had a rigor, became heated, and was seized
with furious delirium; loud cries, much talking, again composed, and
then coma came on; afterwards the bowels disordered, with copious,
bilious, unmixed, and undigested stools; urine black, scanty, and
thin; much restlessness; alvine evacuations of varied characters,
either black, scanty, and verdigris-green, or fatty, undigested, and
acrid; and at times the dejections resembled milk. About the twenty-
fourth, enjoyed a calm; other matters in the same state; became
somewhat collected; remembered nothing that had happened since
he was confined to bed; immediately afterwards became delirious;
every symptom rapidly getting worse. About the thirtieth, acute fever;
stools copious and thin; was delirious; extremities cold; loss of
speech. On the thirty-fourth he died. In this case, as far as I saw, the
bowels were disordered; urine thin and black; disposition to coma;
insomnolency; extremities cold; delirious throughout. Phrenitis.[738]
Case XIV.—In Cyzicus,[739] a woman who had brought forth twin
daughters, after a difficult labor, and in whom the lochial discharge
was insufficient, at first was seized with an acute fever, attended with
chills; heaviness of the head and neck, with pain; insomnolency from
the commencement; she was silent, sullen, and disobedient; urine
thin, and devoid of color; thirst, nausea for the most part; bowels
irregularly disordered, and again constipated. On the sixth, towards
night, talked much incoherently; had no sleep. About the eleventh
day was seized with wild delirium, and again became collected; urine
black, thin, and again deficient, and of an oily appearance; copious,
thin, and disordered evacuations from the bowels. On the fourteenth,
frequent convulsions; extremities cold; not in anywise collected;
suppression of urine. On the sixteenth loss of speech. On the
seventeenth, she died. Phrenitis.
Explanation of the characters. It is probable that death was
caused, on the seventeenth day, by the affection of the brain
consequent upon her accouchement.[740]
Case XV.—In Thasus, the wife of Dealces, who was lodged upon
the Plain, from sorrow was seized with an acute fever, attended with
chills. From first to last she wrapped herself up in her bedclothes; still
silent, she fumbled, picked, bored, and gathered hairs (from them);
tears, and again laughter; no sleep; bowels irritable, but passed
nothing; when directed, drank a little; urine thin and scanty; to the
touch of the hand the fever was slight; coldness of the extremities.
On the ninth, talked much incoherently, and again became
composed and silent. On the fourteenth, breathing rare, large, at
intervals; and again hurried respiration. On the sixteenth, looseness
of the bowels from a stimulant clyster; afterwards she passed her
drink, nor could retain anything, for she was completely insensible;
skin parched and tense. On the twentieth, much talk, and again
became composed; loss of speech; respiration hurried. On the
twenty-first she died. Her respiration throughout was rare and large;
she was totally insensible; always wrapped up in her bedclothes;
either much talk, or complete silence throughout. Phrenitis.[741]
Case XVI.—In Melibœa,[742] a young man having become
heated by drinking and much venery, was confined to bed; he was
affected with rigors and nausea; insomnolency and absence of thirst.
On the first day much fæces passed from the bowels along with a
copious flux; and on the following days he passed many watery.
stools of a green color; urine thin, scanty, and deficient in color;
respiration rare, large, at long intervals; softish distention of the
hypochondrium, of an oblong form, on both sides; continued
palpitation in the epigastric region throughout; passed urine of an oily
appearance. On the tenth, he had calm delirium, for he was naturally
of an orderly and quiet disposition; skin parched and tense;
dejections either copious and thin, or bilious and fatty. On the
fourteenth, all the symptoms were exacerbated; he became
delirious, and talked much incoherently. On the twentieth, wild
delirium, jactitation, passed no urine; small drinks were retained. On
the twenty-fourth he died. Phrenitis.[743]
ON INJURIES OF THE HEAD.
THE ARGUMENT.
This treatise opens with a description of the bones of the head,

which, although in most respects pretty accurate, is remarkable for
containing an account of particular configurations of the cranium,
and of certain varieties in the arrangement of the sutures, which it
has puzzled modern authorities in anatomy to explain, otherwise
than upon the supposition that the writer must have been but
imperfectly acquainted with the subject. But as the work otherwise
bears evidence that our author must have examined the bones of the
head very carefully, and moreover, as in all his works he displays a
wonderfully minute acquaintance with osteology, (to say nothing of
the historical tradition, mentioned by Pausanias, that he was
possessed of a skeleton, which at his death he bequeathed to the
Temple of Apollo, at Delphi,) it seems incredible that he should have
committed most glaring blunders in describing the prominent
features of a part to which it is clear that he had paid very great
attention. Moreover, the reputation of Hippocrates for accuracy stood
so high, that an eminent authority does not hesitate to declare of
him, that he was a man who knew not how to deceive or be
deceived.[744] An easy way of getting rid of the difficulty would no
doubt be, to adopt the conjecture advanced by Scaliger,[745] and in
part approved of by Riolanus,[746] that the treatise had suffered
much in early times, from the interpolations of ignorant transcribers;
or to hold, with M. Malgaigne, that the whole work is to be
condemned as spurious. But it would be a dangerous practice in
ancient criticism, to reject as spurious a work which has such
unexceptionable evidence in its favor, although it may contain matter
which appears to us derogatory to the reputation of its author, and it
will be admitted, by any competent judge who examines the
arguments by Scaliger, that the proofs which he brings forward of
great interpolations in this treatise, are generally of a very fanciful
nature.
On a point so obscure, and which has puzzled so many eminent
scholars, it is to be feared that I shall not be able to throw much
additional light, but as, consistently with my general plan, I cannot
well avoid stating some opinion on the question I shall endeavor to
elucidate it in so far by giving in the first place a brief sketch of the
information supplied by all the other ancient authorities who have
touched upon this subject. I shall begin, then, with Aristotle, the
contemporary of our author, who, in his work “On the History of
Animals,” gives the following very inaccurate description of the
sutures of the human skull: “The female cranium has one circular
suture, but men generally three, which unite in one point. But a male
skull has been seen not having a suture.”[747] Celsus describes the
sutures in the following terms: “Ex ceteris, quo suturæ pauciores
sunt, eo capitis valetudo commodior est. Neque enim certus eorum
numerus est, sicut ne locus quidem. Ferè tamen duæ, super aures,
tempora a superiori parte discernunt; tertia ad aures, occipitium a
summo capite deducit; quarta, ab eodem vertice per medium caput
ad frontem procedit; eaque modo sub imo capillo desinit, modo
frontem ipsam secans inter supercilia finitur.” (viii., 1.) “Nam neque
utique certa sedes, supra posui, suturarum est.” (viii., 4.) Pliny gives
the following description of the head, which it is impossible not to
recognize as having been borrowed from our author: “Vertices bini
hominum tantum aliquibus. Capitis ossa plana, tenuia, sine medullis,
serratis pectinatim structa compagibus.”[748] Of Ruffus Ephesius I
may just mention, that his descriptions of the human body are in
general remarkable for their correctness, which is not to be
wondered at, as he would appear to have followed, in general,
Erasistratus and the other authorities belonging to the great
Alexandrian period in anatomy; and that he has described very
accurately all the sutures of the human cranium, but says not a word
of the different configurations of the head, as here given by our
author.[749] We now come to Galen, who gives a very lengthy
description of the various forms of the head, in nearly the same
terms as our author, and after alluding to the uses of the sutures, the
principal of which he holds to be to permit transpiration from the
brain, he proceeds thus to describe the distribution of the sutures:
“That there is one which runs straight along the middle of the head,
(the sagittal?) and two transverse, (the coronal and lambdoid?) has
been stated previously, and need not require many words in this
place. For, the head being like an oblong sphere, one was justly
made to extend straight through its middle from behind forwards, and
two transverse sutures meet it, and the form of the three sutures is
like the letter H. For the whole head being more elongated in this
case than usual, and, as it were, compressed towards the ears, it
was equitable that the number of the sutures should be unequal as
to length and breadth, otherwise Nature would undeservedly have
been named just, by Hippocrates, in thus giving equal gifts to the
unequal. But it is not the case; for being most just, she formed the
strongest suture which extends along the length of the head single,
being thus proportionate to the width of the parts on both sides of it;
namely, on the right and on the left; but she formed the transverse
double in number, the one behind, as formerly said, called the
lambdoid, and the other before, called the coronal, so that the bone
of the head between these two sutures might be equal to those in
the middle, on each side (the parietal bones?). The sutures of the
head, in that configuration which is acuminated,[750] furnish a very
great example of the justness of Nature. For there are three principal
figures of the head: the one entirely opposed to the natural
configuration already described, when the head loses both its
protuberances, that behind and the other before, and is equal on all
hands, and like a true sphere; and two others, the one form having
no protuberance in front, and the other none in the occiput. The
sutures of the spherical head are like the letter χ, two only in number,
and intersecting one another; the one extending transverse from the
one ear to the other, and the other extending straight through the
middle of the vertex to the middle of the forehead. For, as when one
part of the head is excessive, being longer than the other, it was just
that the longer form should have more sutures, so, when both are
alike, Nature bestowed an equal number on both. But in the head
which wants the protuberance at the occiput, the straight and the
coronal sutures remain, but the lambdoid is wanting (it being near to
the protuberance that is wanting), so that the figure of the two
resembles the letter T; as also when the protuberance of the head in
front is wanting, the coronal at the same time is wanting, but there
remains the one running lengthways and joining the lambdoid, and
this form of construction is made to resemble the letter T. A fourth
species of acuminated (sugar-loaf) head might be imagined, but
which does not occur, with the head more prominent at the two ears
than in front and behind.” He goes on to state the reasons why there
is no such construction of the head as this, and concludes as
follows: “Wherefore Hippocrates described four configurations, and
the sutures of each, in the manner we have now said that they exist,
being justly distributed to each configuration by Nature as to position
and number.”[751] The description of the bones and sutures of the
head, given in the Latin work “De Ossibus,” generally attributed to
Galen, is to the same effect. The same number of distinct
configurations of the head, and the same characters as regards the
sutures, is also given by Avicenna, who professedly copies from
Galen. (I., i., 5, 3.)
When examined together, these descriptions certainly must be
admitted to have the appearance of being all derived from one
original, namely, from our author, in this place; and taken literally,
there can be no doubt that their meaning amounts to this: that the
number of the sutures varies with the form of the head; that when
there are protuberances both before and behind, the head in its
upper part has two transverse sutures, namely, the coronal and the
lambdoid, and one longitudinal, namely, the sagittal; that if the
anterior protuberance be wanting, the coronal is wanting, and, if the
posterior, the lambdoid. Now I need scarcely remark, that modern
anatomists do not recognize such varieties in the configuration of the
head nor in the numbers of the sutures, and that it is very rare
indeed for either the coronal or the lambdoid suture to be found
wanting. To all appearance, then, Galen was mistaken, and it only
appears remarkable that, with all his knowledge of anatomy,
theoretical and practical, and considering the opportunities which he
must have possessed of examining human skeletons in Alexandria,
he should have failed to observe and describe the bones of the
cranium for himself.
Before stating my own conjectures on this question, it may be
interesting to examine the solution of it attempted by authorities who
lived about the period when the original study of human anatomy
was revived in modern times. In the first place, then, I may mention
that Ambrose Paré, who, I need scarcely say, was possessed of no
mean talent for original observation, in treating of fractures of the
head, adopts exactly the description given by Hippocrates; thus he
describes “the bunches of the head” in nearly the same terms as our
author, and adds, that such “bunches change the figure and site of
the sutures,” and that “there be some skulls that want the foremost
suture, and other some the hind, and sometimes none of the true
sutures, but only the false, or spurious, remain.”[752] Nay, it. cannot
but appear remarkable, that Vesalius, the great antagonist of Galen
and of the ancient authorities in general, in the present instance
does not venture to call in question their opinion, but gives a
description of the different forms of the head, and the varieties of the
sutures, which scarcely at all differs from that given by Hippocrates.
[753] It is singular, also, that certain other authorities, who were much
more disposed to show a leaning to antiquity, such as Columbus,
Eustachius, Fallopius, and Riolanus, should, in the present instance,
have manifested a more independent spirit in challenging the
authority of Hippocrates, though, at the same time, they show a
disposition to find out some mode of bringing him clear off. Thus, for
example, Riolanus is compelled to admit that there is no such variety
in the forms and numbers of the sutures as Hippocrates describes;
but he attempts to free him from error, by suggesting that the cases
in which Hippocrates found them wanting must have been those of
old men.[754] He also quotes some very extraordinary instances, in
which something approaching the varieties described by our author
had been remarked.[755] Fallopius does not hesitate, in his great
anatomical work, to express the surprise he felt that all the
authorities should have assented to the descriptions of the
protuberances and sutures of the head given by Hippocrates; for that
he, after having examined large heaps of crania in the Musea of
Ferrara and Florence, had not found that they agreed with the
descriptions given by Hippocrates; that he had seen crania without a
suture, and yet not wanting in the protuberances; and in like manner,
that he had seen the coronal suture obliterated, and yet the skull
possessed its anterior prominence, and the lambdoid wanting,
although the posterior protuberance was as usual. Altogether, then,
in this work he modestly ventures to impugn the authority of
Hippocrates.[756] In his work entitled “Expositio in Librum Galeni de
Ossibus,” he adopts the same views, and there declares that he had
never seen the sutures obliterated except from old age. But, in his
work entitled “Expositio in Lib. Hippocrat. de Vulneribus Capitis,” he
gives two suppositions, which he had devised in order to defend the
authority of Hippocrates: first, that Hippocrates did not give these
varieties of form as real, but as hypothetical; and second, that he
merely described them as being the vulgar opinion, without pledging
himself to the correctness of the description. These, as far as I am
aware, are the only defences which have ever been set up for our
author in this matter, and it must be admitted that they are not very
satisfactory. I shall now present the reader with the conjectural
explanation which has occurred to myself. I have imagined that what
Hippocrates meant was to express himself to the following effect:
when the forehead is remarkably prominent, and, at the same time,
there is a great depression behind, the cranium, if looked upon from
above, will show the coronal suture running across the fore part of
the head, and the sagittal through its middle, while the lambdoid will
be inconspicuous, from being below the level of the coronal. The two
together, then, would form some resemblance to the letter T. When,
on the other hand, the forehead is low, that is to say, wants its
normal development, and the occiput is unusually prominent, the
lambdoid suture joins the sagittal, so as to present some appearance
of the same letter reversed. But in a square-built head, where the
frontal and occipital regions have protuberances equally developed,
the coronal and lambdoid sutures run nearly parallel to one another,
and are joined in the middle by the sagittal, in which case the three
sutures may be imagined to present some resemblance to the Greek
letter Η. When there is no protuberance either before or behind, and
the sagittal suture passes through the middle of the bone down to
the nasal process, the coronal suture intersects it, so as to give them
something like the shape of the Greek letter χ.[757] I offer this
explanation, however, merely as a conjecture, and wish the reader to
judge of it accordingly.
I now proceed to give an analysis of the contents of this treatise,
and to attempt to form a correct estimate of their value.
Injuries of the cranial bones are divided by our author into five
orders, as follows: 1, simple fractures, or fissures of various kinds
and sizes (§ 4); 2, contusion, without fracture or depression (§ 5); 3,
fractures attended with depression (§ 6); 4, the hedra, that is to say,
the indentation or cut in the outer table of the bone, and not
necessarily attended either with fracture or contusion (§ 7); 5, the
counter-fissure, or fracture par contre-coup fracture and the severe
contusion, require the operation of trepanning; whereas neither the
hedra (or simple cut) nor the depressed fracture require it, and the
counter-fissure does not admit it, owing to the obscurity of the
symptoms with which it is attended (§ 9).
In the first place, the surgeon is to ascertain the nature and
situation of the wound, by a careful investigation of all the
circumstances of the case, but so as to avoid the use of the sound, if
possible (§§ 9, 10).
Next are described the various kind of injury which the different
sorts of weapons are most likely to inflict, and from the consideration
of them the surgeon is to form an estimate of the probable nature of
the accident (§ 11).
The characters of the hedra, or superficial injury of the cranium,
and the difficulty of forming a correct estimate of it, when
complicated by the presence of a suture, are strongly insisted upon
(§ 12).
The principles upon which the treatment of injuries situated in
different parts of the head should be treated, are carefully defined
and stated. Great, and as now would be thought, superfluous
directions are given, for ascertaining whether or not a fissure exists
in the bone. The treatment, as far as applications go, is to be mild
and desiccant. When a fracture cannot be made to disappear by
scraping, the trepan is to be applied (§§ 13, 14).
The dangers which the bone incurs of becoming affected from
the soft parts, are strongly insisted upon, and applications of a drying
nature are prescribed (§ 15).
The condition of a piece of bone which is going to exfoliate is
correctly and strikingly described (§ 16).
The treatment of depression is laid down, and the danger of
applying the trepan in this case is strongly insisted upon (§ 17).
The peculiarities in the case of children are pointed out. Under
certain circumstances, when there is contusion combined with the
fracture, he admits of perforating the skull with a small trepan (§ 18).
When, after a severe injury, symptoms of irritation and
inflammation appear to be coming on, the surgeon is to lose no time
in proceeding to the operation. Some correct observations are made
on the consequences of injuries of the head on other parts of the
body (§ 19).
The treatment of erysipelatous inflammation is distinctly laid
down (§20).
The operation of trepanning the skull is circumstantially
described, and an interesting description is given of a mode of doing
the operation peculiar to our author[758] (§ 21).
This, then, as far as I know, is the first exposition ever made of a
highly important subject in surgery, upon which professional men are
still greatly divided in opinion. I cannot, then, resist the temptation to
offer some remarks on the views of practice here recommended, and
to institute a comparison between them and certain methods of
treatment which have been in vogue of late years.
I can scarcely doubt but it will be generally admitted that the
exposition of the subject here given is remarkably lucid, that our
author’s divisions of it are strongly marked, and his rules of practice,
whether correct or not, distinctly laid down. At all events, it will not be
affirmed that there is any confusion in his ideas, or that his principles
of treatment are not properly defined. After all that has been written
on injuries of the head, it would be difficult to point to any better
arrangement of them than that of our author, into five orders: 1st,
simple fractures without depression; 2d, contusions without fracture
or depression; 3d, depression with fracture; 4th, simple incisions
without fracture; 5th, fractures par contre-coup.
As regards the operation of trepanning the skull, then, our
author’s rule of practice is sufficiently well defined: we are to operate
in the first two of these cases, that is to say, in simple fractures and
contusions, but not in the last three, that is to say, in fracture with
depression, in simple incisions in the skull, and in the counter-
fissure. To begin, then, with the examination of those cases in which
the operation is proscribed: it is not to be had recourse to in the
counter-fissure, because, from the nature of it, there is generally no
rule by which its existence can be positively ascertained, and
therefore the case is to be given up as hopeless.
In the simple incision of the bone, that is to say, in the slash or
indentation, when the effects of the injury are not transmitted to the
brain, it must be obvious that all instrumental interference must be
strongly contraindicated.[759]
At first sight it will appear remarkable to a surgeon, who
approaches the subject with views exclusively modern, that our
author should have interdicted the use of instruments in that class of
injuries in which one would be inclined to suppose that they are most
clearly indicated, namely, in a fracture of considerable extent,
attended with depression of part of the bone from its natural level.
Several questions present themselves here to be solved. Is the
operation generally required? Has it been successful when it has
been had recourse to? When it is to be performed, should it be done
immediately, or not until the bad effects of the injury have manifested
themselves?
With regard, then, to the necessity of the operation for depressed
fractures, the most discordant opinions have prevailed in modern
times, and even within a very recent period. Not to go farther back
than Pott, it is well known that he established it as the general rule of
practice, that in every case of fracture with depression, the skull
should be perforated, and the depressed portion of the bone either
raised to its level, or entirely removed. But since his time a great
change of opinion has taken place on this subject, and of late it has
become the general rule of practice (if rule can be predicated, where
opinions are so vague and indeterminate) not to interfere, even in
cases of depression, unless urgent symptoms have supervened. The
late Mr. Abernethy took the lead in questioning the propriety of the
rule laid down by Pott; and with the view of demonstrating that the
operation may be often dispensed with in fractures complicated with
depression, and in order, as he says, “to counteract in some degree
the bias which long-accustomed modes of thinking and acting are
apt to impress on the minds of practitioners,” he relates the histories
of five cases of fracture with depression, which, in the space of
twelve months, occurred under his own eyes in St. Bartholomew’s
Hospital, and which all terminated favorably, although no operation
was performed. These cases, supported by the authority of so great
a name as Mr. Abernethy, made a deep impression on the
profession, especially in this country, so that it became the
established rule of practice in British surgery never to interfere in
cases of fracture, unless with the view of removing urgent
symptoms. See Cooper’s Surgical Dictionary, edit. 1825, and the
previous edition. The old Hippocratic rule in regard to the trepan,
when it is at all to be applied, namely, that of applying it as a
preventive of bad consequences, was altogether eschewed, and it
was held to be perfectly unwarrantable to perforate the skull, except
with the intention of removing substances which were creating
irritation and pressure of the brain. This practice, I say, was
sanctioned by all the best army and hospital surgeons, from about
the beginning of the present century, down to a very recent period.
What, then, it will be asked, have been the results? Has experience
confirmed the safety of this rule of practice, or has it not? To enable
us to solve these queries, we have most elaborate and trustworthy
statistics, published a few years ago by Dr. Laurie of Glasgow, which
deserve to be seriously studied by every surgeon who may be called
upon to discharge the duties of his profession in such cases. I
cannot find room for long extracts from these valuable papers, but
may be allowed to state a few of the more important results which
are to be deduced from Dr. Laurie’s interesting investigation. Coming
then at once to the point, it deserves to be remarked that Dr. Laurie’s
ample experience has led him to reject decidedly the rule of practice,
which, as I have stated, was established by Mr. Abernethy, about
forty years ago, namely, that, in cases of depression, the symptoms
of compression should be our guide to the employment of the
trephine. He adds, “however well this rule may sound, when
delivered ex cathedrá, it will be found of very little practical utility, for
this reason, that if we limit interference to cases exhibiting symptoms
of compression, we had much better not interfere at all, inasmuch as
such cases prove almost invariably fatal. Such, at least, has been
the experience of the Glasgow hospitals; for out of fifty-six cases
operated upon, including, in point of time, a period little short of fifty
years, there does not appear in our records a single unequivocal
instance of profound insensibility, in which the mere operation of
trepanning removed the coma and paralysis, or in any way conduced
to the recovery of the patient. We wish to be clearly understood as
speaking of the trephine used in reference to the state of the bone in
cases of profound insensibility, not employed to remove
extravasated blood. Nor does the cause of our want of success
appear at all obscure. We believe that in practice the cases of urgent
compression dependent on depressed bone alone are very few
indeed; we are well aware that many such are on record, we do not
presume to impugn their accuracy, we merely affirm that the records
of the Glasgow Infirmary do not add to the number.” He thus states
his views with regard to the principles by which the application of the
trephine should be regulated. “From what we have said, it will appear
that we coincide with these who, in using the trephine, in cases of
compound fracture of the skull, look more to the state of the bone
than to the general symptoms, and who employ it more as a
preventive of inflammation and its consequences, than as a cure for
urgent symptoms, the immediate result of the accident.” He goes on
to state that “the details we have given are by no means in favor of
the trephine. Of fifty-six cases operated upon, eleven recovered, and
forty-five died. We feel assured that this affords too favorable a view
of the actual results.”[760]
From the extracts now given, it will readily be seen that this very
able authority has rejected entirely the rule of practice established by
Mr. Abernethy, and that, in so far, he has reverted to the principle
upon which the use of the instruments in simple fractures of the skull
was regulated by Hippocrates, namely, as a preventive of the bad
consequences of fracture on the brain, rather than with the view of
relieving them when established. It will further be seen that, in
whatever way applied, the use of perforating instruments in the case
of depressed fractures is attended with so unsatisfactory results, that
it may be doubted if any other operation in surgery, recognized as
legitimate, be equally fatal.[761] Less than one fifth of the patients
operated upon recovered. In fact, he very candidly admits “that it
would not have been greatly to the disadvantage of the patients
admitted into the Glasgow Infirmary, if the trephine had never found
its way within its walls.” He further, in conclusion, adverts to the well-
known fact that Desault, in the end, completely abandoned the
operation, and that Mr. Lawrence states, “as far as the experience of
this Hospital (St. Bartholomew’s) goes, he can cite very few
instances in which the life of the patient had been saved by the
operation of trephining.”[762]
Altogether, then, it will be allowed to no very questionable
whether, in general, the Hippocratic treatment, in cases of fracture
with depression, would not be fully as successful as the modern
practice of perforating the skull. Moreover, it is by no means well
ascertained, as generally assumed by superficial observers of facts
in medical practice, that depressed fractures are more dangerous
than other injuries of the skull attended with less formidable
appearances. Indeed, recent experience has shown, in confirmation
of the opinion advanced by our author, that extensive fractures, with
great depression, are frequently not followed by any very dangerous
train of consequences. (See Thomson’s “Observations made in the
Military Hospitals of Belgium,” pp. 59, 60; Hennen’s “Military
Surgery,” p. 287; Cooper’s “Lectures,” xiii.; Mr. Guthrie’s “Lectures on
Injuries of the Head,” p. 56.) All these, in substance, coincide with
Mr. Guthrie, who mentions with approbation that “it has been stated
from the earliest antiquity, that the greater the fracture, the less the
concussion of the brain.” I may mention further, that I myself, in the
course of my own experience, have known many instances in which
fractures with considerable depression were not followed, either
immediately or afterwards, by any bad consequences; while, on the
other hand, I have known cases in which simple contusion of the
bone, without fracture or extravasation, and without even very urgent
symptoms of concussion at first, have proved fatal in the course of a
day or two. Now, in such circumstances, Hippocrates would have
operated by either perforating the skull at once, down to the meninx,
and removing a piece of it, or by sawing it nearly through, and
leaving the piece of bone to exfoliate. It will be asked here, what
object can he have had in view by this procedure? This he has
nowhere distinctly defined; but, judging from the whole tenor of this
treatise, and that of his commentator, Galen, I can have no doubt in
my mind that what he wished to accomplish was to loosen the bones
of the head, and give greater room to the brain, which he conceived
to be in a state of congestion and swelling brought on by the
vibration, or trémoussement, communicated directly to the brain by
the contusion. It is, in fact, an opinion which Hippocrates repeatedly
inculcates, not only with regard to the brain, but also respecting
injuries of the chest and joints, that severe contusions are, in
general, more dangerous than fractures, the effects of the vibration
in the former case being more violent than in the latter.[763]
Believing, then, that, in contusions, the internal structure of the brain
is extensively injured, and that irritation, with hypertrophy, are the
consequences, he advocated instrumental interference, in order as I
have stated, to give more room to the brain, and relieve it from its
state of compression.[764] This, no doubt, was the rationale of his
practice also in simple fractures, not attended with depression, that
is to say, his object in perforating the skull was to remove tension,
and furnish an outlet to the collection within, whether of a liquid or a
gaseous nature.
There can be no doubt that our author also had it in view, by
perforating the skull, to afford an issue to extravasated blood and
other matters collected within the cranium. This clearly appears from
what is stated in section 18, and the same rule of practice is
distinctly described by Celsus in the following terms: “Raro, sed
aliquando tamen evenit, ut os quidem totum integrum maneat, intus
vero ex ictu vena aliqua in cerebri membrana rupta aliquid sanguinis
mittat; isque ibi concretus magnos dolores moveat, et oculos
quibusdam obcæcet.... Sed ferè contra id dolor est, et, eo loco cute
incisa, pallidum os reperitur: ideoque id os quoque excidendum est.”
(viii., 4.) It is quite certain, then, that one of the objects for which our
author recommended trepanning, was to give issue to extravasated
blood on the surface of the skull. This naturally leads me to compare
the results of modern experience in the treatment of cases of
contusion, with or without extravasation of blood.
All the earlier of our modern authorities on surgery, such as
Theodoric, Pet. c. Largelata, Ambrose Paré, Wiseman, and
Fallopius, distinctly held that contusions of the skull, even when not
complicated with a fracture, are often of so formidable a nature as to
require the use of perforating instruments. The same views are
strenuously advocated by Pott, who has described the effects of
contusion in very elegant and impressive language. See page 42;
ed. Lond. 1780. The upshot is, that one of the consequences of a
severe contusion of the bone frequently is separation of the
pericranium, “which is almost always followed by a separation
between the cranium and the dura mater; a circumstance extremely
well worth attending to in fissures and undepressed fractures of the
skull, because it is from this circumstance principally that the bad
symptoms and the hazard in such cases arise.” (p. 50.)[765] After
insisting, in very strong terms, on the danger attending severe
contusions of the, skull, he proceeds to lay down the rules of
treatment, which, in a word, are comprehended in the two following
intentions:—first, to prevent bad consequences by having recourse,
at first, to depletion; and, second, to procure the discharge of matter
collected under the cranium, which can be answered only by the
perforation of it. He agrees with Archigenes that the operation is
generally too long deferred, and that the sooner it is performed the
better. Still, however, it is to be borne in mind that even Potts does
not make it a general rule to operate at first, before the bad
symptoms have come on, that is to say, during the first three days,
and that he rather appears to have followed Celsus, who alludes to
the method of Hippocrates, and describes his rule of practice in the
following terms:
—“In omni vero fisso fractoque osse, protinus antiquiores medici
ad ferramenta veniebant, quibus id exciderent. Sed multo melius est
ante emplastra experiri, etc.... Si vero sub prima curatione febris
intenditur, ... magni dolores sunt, cibique super hæc fastidium
increseit; tum demum ad manum scalprumque veniendum est.” (viii.,
4.) Pott then, it appears, follows the rule of Celsus, and does not
operate until unpleasant effects have developed themselves;[766]
but, at the same time, he candidly admits that, although the course
now described be all that our art is capable of doing in these
melancholy cases, he wishes he could say that it was frequently
successful. He then goes on to relate several cases: first, of simple
contusion without a wound; second, of contusion with a wound; and,
third, of contusion with extravasation. In all these classes of cases
he operated with very equivocal results; but then it is to be borne in
mind, that, as I have said, he operated, like Celsus, after the bad
effects had come on, and not, like Hippocrates, at first, in order to
prevent them. Even with all these discouraging results, he continued
to adhere to this rule of treatment, which, under the sanction of his
name, became the established practice of the profession. The late
Mr. Abernethy, who took the lead in innovating upon Pott’s rules for
the application of the trephine, did not venture to make any material
change in this case when he supposed that there was any
considerable extravasation of blood; and he delivered it as a test
whereby we might judge whether or not a great vessel had been
ruptured within the skull, to examine whether or no the bone bled,
having generally found, as, indeed, had been clearly laid down by
Celsus, that in these cases the bone does not bleed. The rule of
practice, then, to operate in order to remove the coagula of blood
and matters which form between the skull and the dura mater, was
sanctioned by Sir Charles Bell and Sir Astley Cooper; but they, like
Mr. Abernethy, generally condemn interference when the fluids are
situated below the membrane. On this subject Mr. Guthrie remarks:
—“The operation of incising the dura mater, to admit of the discharge

Principles of Data Wrangling Practical Techniques For Data Preparation 1st Edition Tye Rattenbury

Uploaded by

Copyright:

Available Formats

Principles of Data Wrangling Practical Techniques For Data Preparation 1st Edition Tye Rattenbury

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Principles of Data Wrangling Practical Techniques For Data Preparation 1st Edition Tye Rattenbury

Uploaded by

Copyright:

Available Formats

Download and Read online, DOWNLOAD EBOOK, [PDF EBOOK EPUB ], Ebooks

download, Read Ebook EPUB/KINDE, Download Book Format PDF

Principles of Data Wrangling Practical Techniques

Data Mining and Data Warehousing: Principles and

Mastering the SAS DS2 Procedure Advanced Data Wrangling

Python for Data Analysis Data Wrangling with Pandas

Practical Data Science with SAP Machine Learning

Python for Data Analysis Data Wrangling with pandas

Feature engineering for machine learning principles and

Python Data Analysis: Perform data collection, data

Principles of Data Science Learn the techniques and

Tye Rattenbury, Joseph M. Hellerstein, Jeffrey

Editor: Shannon Cutt

Production Editor: Kristen Brown

Copyeditor: Bob Russell, Octal Publishing, Inc.

Proofreader: Christina Edwards

Interior Designer: David Futato

Cover Designer: Karen Montgomery

Illustrator: Rebecca Demarest

May 2017: First Edition

Revision History for the First Edition

Magic Thresholds, PYMK, and User Growth at

How many friends does a new Facebook user need to be X-

In this chapter, we present a framework for working with data. Our

How Data Flows During and Across Projects

Table 2-1. Data moves through stages

What kinds of records are in the data?

Figure 2-1. Data value funnel

When it comes to delivering production value from your data, there

Connecting Analytic Actions to Data

Figure 2-2. A holistic workflow framework for data projects

Of course, many individuals and organizations will customize the

Raw Data Stage Actions: Ingest Data and

Ingesting Known and Unknown Data

This treatise opens with a description of the bones of the head,

You might also like