Data Science in Theory and Practice Techniques For Big Data Analytics and Complex Data Sets Maria C Mariani Full Chapter
Data Science in Theory and Practice Techniques For Big Data Analytics and Complex Data Sets Maria C Mariani Full Chapter
Data Science in Theory and Practice Techniques For Big Data Analytics and Complex Data Sets Maria C Mariani Full Chapter
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, except as permitted by law. Advice on how to obtain permission to reuse material
from this title is available at http://www.wiley.com/go/permissions
The right of Maria Cristina Mariani, Osei Kofi Tweneboah, and Maria Pia Beccar-Varela to be
identified as the authors of this work has been asserted in accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley
products visit us at www.wiley.com
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some
content that appears in standard print versions of this book may not be available in other
formats.
10 9 8 7 6 5 4 3 2 1
v
Contents
3 Multivariate Analysis 21
3.1 Introduction 21
3.2 Multivariate Analysis: Overview 21
3.3 Mean Vectors 22
3.4 Variance–Covariance Matrices 24
3.5 Correlation Matrices 26
vi Contents
5 Introduction to R 61
5.1 Introduction 61
5.2 Basic Data Types 62
5.2.1 Numeric Data Type 62
5.2.2 Integer Data Type 62
5.2.3 Character 63
5.2.4 Complex Data Types 63
5.2.5 Logical Data Types 64
5.3 Simple Manipulations – Numbers and Vectors 64
5.3.1 Vectors and Assignment 64
Contents vii
6 Introduction to Python 81
6.1 Introduction 81
6.2 Basic Data Types 82
6.2.1 Number Data Type 82
6.2.1.1 Integer 82
6.2.1.2 Floating-Point Numbers 83
6.2.1.3 Complex Numbers 84
6.2.2 Strings 84
6.2.3 Lists 85
6.2.4 Tuples 86
6.2.5 Dictionaries 86
6.3 Number Type Conversion 87
6.4 Python Conditions 87
6.4.1 If Statements 88
6.4.2 The Else and Elif Clauses 89
6.4.3 The While Loop 90
6.4.3.1 The Break Statement 91
6.4.3.2 The Continue Statement 91
6.4.4 For Loops 91
viii Contents
7 Algorithms 97
7.1 Introduction 97
7.2 Algorithm – Definition 97
7.3 How to Write an Algorithm 98
7.3.1 Algorithm Analysis 99
7.3.2 Algorithm Complexity 99
7.3.3 Space Complexity 100
7.3.4 Time Complexity 100
7.4 Asymptotic Analysis of an Algorithm 101
7.4.1 Asymptotic Notations 102
7.4.1.1 Big O Notation 102
7.4.1.2 The Omega Notation, Ω 102
7.4.1.3 The Θ Notation 102
7.5 Examples of Algorithms 104
7.6 Flowchart 104
7.7 Problems 105
Bibliography 353
Index 359
xvii
List of Figures
Figure 16.5 Two class problem when data is not linearly separable. 224
Figure 16.6 ROC curve for linear SVM. 226
Figure 16.7 ROC curve for nonlinear SVM. 227
Figure 17.1 Single hidden layer feed-forward neural networks. 232
Figure 17.2 Simple recurrent neural network. 234
Figure 17.3 Long short-term memory unit. 235
Figure 17.4 Philippines (PSI). (a) Basic RNN. (b) LTSM. 239
Figure 17.5 Thailand (SETI). (a) Basic RNN. (b) LTSM. 240
Figure 17.6 United States (NASDAQ). (a) Basic RNN. (b) LTSM. 241
Figure 17.7 JPMorgan Chase & Co. (JPM). (a) Basic RNN. (b) LTSM. 242
Figure 17.8 Walmart (WMT). (a) Basic RNN. (b) LTSM. 243
Figure 18.1 3D power spectra of the daily returns from the four analyzed stock
companies. (a) Discover. (b) Microsoft. (c) Walmart. (d) JPM
Chase. 255
Figure 18.2 3D power spectra of the returns (generated per minute) from the
four analyzed stock companies. (a) Discover. (b) Microsoft.
(c) Walmart. (d) JPM Chase. 257
Figure 19.1 Time-frequency image of explosion 1 recorded by ANMO
(Table 19.2). 270
Figure 19.2 Time-frequency image of earthquake 1 recorded by ANMO
(Table 19.2). 270
Figure 19.3 Three-dimensional graphic information of explosion 1 recorded
by ANMO (Table 19.2). 272
Figure 19.4 Three-dimensional graphic information of earthquake 1 recorded
by ANMO (Table 19.2). 272
Figure 19.5 Time-frequency image of explosion 2 recorded by TUC
(Table 19.3). 273
Figure 19.6 Time-frequency image of earthquake 2 recorded by TUC
(Table 19.3). 273
Figure 19.7 Three-dimensional graphic information of explosion 2 recorded
by TUC (Tabl 19.3). 274
Figure 19.8 Three-dimensional graphic information of earthquake 2 recorded
by TUC (Table 19.3). 274
Figure 21.1 R∕S for volcanic eruptions 1 and 2. 322
Figure 21.2 DFA for volcanic eruptions 1 and 2. 323
Figure 21.3 DEA for volcanic eruptions 1 and 2. 323
xxi
List of Tables
Preface
We conclude this book with a discussion of ethics in data science: With great
power comes great responsibility.
The authors express their deepest gratitude to Wiley for making the publication
a reality.
1.1 Introduction
Data science is one of the most promising and high-demand career paths for skilled
professionals in the 21st century. Currently, successful data professionals under-
stand that they must advance past the traditional skills of analyzing large amounts
of data, statistical learning, and programming skills. In order to explore and dis-
cover useful information for their companies or organizations, data scientists must
have a good grip of the full spectrum of the data science life cycle and have a level
of flexibility and understanding to maximize returns at each phase of the process.
Data science is a “concept to unify statistics, mathematics, computer science,
data analysis, machine learning and their related methods” in order to find trends,
understand, and analyze actual phenomena with data. Due to the Coronavirus dis-
ease (COVID-19) many colleges, institutions, and large organizations asked their
nonessential employees to work virtually. The virtual meetings have provided col-
leges and companies with plenty of data. Some aspect of the data suggest that
virtual fatigue is on the rise. Virtual fatigue is defined as the burnout associated
with the over dependence on virtual platforms for communication. Data science
provides tools to explore and reveal the best and worst aspects of virtual work.
In the past decade, data scientists have become necessary assets and are present
in almost all institutions and organizations. These professionals are data-driven
individuals with high-level technical skills who are capable of building complex
quantitative algorithms to organize and synthesize large amounts of information
used to answer questions and drive strategy in their organization. This is coupled
with the experience in communication and leadership needed to deliver tangible
results to various stakeholders across an organization or business.
Data scientists need to be curious and result-oriented, with good knowledge
(domain specific) and communication skills that allow them to explain very tech-
nical results to their nontechnical counterparts. They possess a strong quantitative
background in statistics and mathematics as well as programming knowledge with
Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets,
First Edition. Maria Cristina Mariani, Osei Kofi Tweneboah, and Maria Pia Beccar-Varela.
© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
2 1 Background of Data Science
focuses in data warehousing, mining, and modeling to build and analyze algo-
rithms. In fact, data scientists are a group of analytical data expert who have the
technical skills to solve complex problems and the curiosity to explore how prob-
lems need to be solved.
accurate, and useful information than that provided by any individual data
source. Veracity: Veracity describes the quality of data and the data value. The
quality of data obtained can greatly affect the accuracy of the analyzed results. In
the next subsection we will discuss some big data architectures. A comprehensive
study of this topic can be found in the application architecture guide of the
Microsoft technical documentation.
● Analytical data store: Several big data solutions prepare data for analysis and
then serve the processed data in a structured format that can be queried using
analytical tools. The analytical data store used to serve these queries can be a
Kimball-style relational data warehouse, as observed in most classical business
intelligence (BI) solutions. Alternatively, the data could be presented through a
low-latency NoSQL technology, such as HBase, or an interactive Hive database
that provides a metadata abstraction over data files in the distributed data store.
● Analysis and reporting: The goal of most big data solutions is to provide insights
into the data through analysis and reporting. Users can analyze the data using
mathematical and statistical models as well using data visualization techniques.
Analysis and reporting can also take the form of interactive data exploration by
data scientists or data analysts.
● Orchestration: Several big data solutions consist of repeated data processing
operations, encapsulated in workflows, that transform source data, move data
between multiple sources and sinks, load the processed data into an analytical
data store, or move the results to a report or dashboard.
7
2.1 Introduction
The matrix algebra and random vectors presented in this chapter will enable us to
precisely state statistical models. We will begin by discussing some basic concepts
that will be essential throughout this chapter. For more details on matrix algebra
please consult (Axler 2015).
Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets,
First Edition. Maria Cristina Mariani, Osei Kofi Tweneboah, and Maria Pia Beccar-Varela.
© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
8 2 Matrix Algebra and Random Vectors
Definition 2.3 (Vector addition) The sum of two vectors of the same size is
the vector obtained by adding corresponding entries in the vectors:
⎡x1 ⎤ ⎡y1 ⎤ ⎡ x1 + y1 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
x y x + y2 ⎥
x + y = ⎢ 2⎥ + ⎢ 2⎥ = ⎢ 2
⎢⋮⎥ ⎢⋮⎥ ⎢ ⋮ ⎥
⎢x ⎥ ⎢y ⎥ ⎢x + y ⎥
⎣ n⎦ ⎣ n⎦ ⎣ n n⎦
2.2.2 Matrices
The notation Ai,j denotes the entry in row i, column j of A. In other words,
the first index refers to the row number and the second index refers to the column
number.
Example 2.1
⎛1 4 8⎞
If A = ⎜0 4 9⎟ ,
⎜ ⎟
⎝7 −1 7⎠
then A3,1 = 7.
Example 2.2
( ) ⎛1 0⎞
1 4 8
If A2×3 = , then AT3×2 = ⎜4 4⎟ .
0 4 9 ⎜ ⎟
⎝8 9⎠
by the scalar:
⎡ cA1,1 · · · cA1,n ⎤
cA = ⎢ ⋮ ⋮ ⎥.
⎢ ⎥
⎣cAm,1 · · · cAm,n ⎦
Definition 2.7 (Matrix addition) The sum of two vectors of the same size is
the vector obtained by adding corresponding entries in the vectors:
⎡ A1,1 · · · A1,n ⎤ ⎡ B1,1 · · · B1,n ⎤
A+B=⎢ ⋮ ⋮ ⎥+⎢ ⋮ ⋮ ⎥
⎢ ⎥ ⎢ ⎥
⎣ Am,1 · · · Am,n ⎦ ⎣ Bm,1 · · · Bm,n ⎦
⎡ A1,1 + B1,1 · · · A1,n + B1,n ⎤
=⎢ ⋮ ⋮ ⎥.
⎢ ⎥
⎣ Am,1 + Bm,1 · · · Am,n + Bm,n ⎦
Example 2.3
⎡1 4 ⎤ [ ]
1 1
If A = ⎢0 4 ⎥ and B=
⎢ ⎥ 2 1
⎣7 −1⎦
then
⎡1 4 ⎤ [ ] ⎡ 1(1) + 4(2) 1(1) + 4(1) ⎤ ⎡9 5⎤
⎢ ⎥ 1 1
AB = 0 4 = ⎢ 0(1) + 4(2) 0(1) + 4(1) ⎥ = ⎢8 4⎥ .
⎢ ⎥ 2 1 ⎢ ⎥ ⎢ ⎥
⎣7 −1⎦ ⎣7(1) + −1(2) 7(1) + −1(1)⎦ ⎣5 6⎦
10 2 Matrix Algebra and Random Vectors
[ ] [ ]
1 4 1 6
Example 2.4 The matrix A = is symmetric; the matrix B = is not
4 4 4 −4
symmetric.
Definition 2.11 (Trace) For any square matrix A, the trace of A denoted
by tr(A) is defined as the sum of the diagonal elements, i.e.
∑
n
tr(A) = aii = a11 + a22 + · · · + ann .
i=1
1. ∅ ∈ .
2. If F ∈ then its complement F c ∈ .
3. If F1 , F2 , … is a countable collection of sets in then their union ∪∞
n=1 Fn ∈ .
The Dirichlet distribution Dir(𝜶), named after Johann Peter Gustav Lejeune
Dirichlet (1805–1859), is a multivariate distribution parameterized by a vector 𝛂
of positive parameters (𝛼1 , … , 𝛼n ).
Specifically, the joint density of an n-dimensional random vector X ∼ Dir(𝜶) is
defined as:
( n )
1 ∏ 𝛼 −1
f (x1 , … , xn ) = x 𝟏{xi >0} 𝟏{x1 +···+xn =1} ,
i
B(𝜶) i=1 i
where 1{x1 +···+xn =1} is an indicator function.
16 2 Matrix Algebra and Random Vectors
1A ∶ X → {0, 1}
defined as
{
1, if x ∈ A,
1A (x) =
0, if x ∉ A.
The components of the random vector X thus are always positive and have the
property X1 + · · · + Xn = 1. The normalizing constant B(𝜶) is the multinomial
beta function, that is defined as:
∏n ∏n
i=1 Γ(𝛼i ) Γ(𝛼i )
B(𝜶) = (∑n ) = i=1 ,
i=1 𝛼i
Γ Γ(𝛼0 )
∑n ∞
where we used the notation 𝛼0 = i=1 𝛼i and Γ(x) = ∫0 tx−1 e−t dt for the Gamma
function.
Because the Dirichlet distribution creates n positive numbers that always sum
to 1, it is extremely useful to create candidates for probabilities of n possible out-
comes. This distribution is very popular and related to the multinomial distri-
bution which needs n numbers summing to 1 to model the probabilities in the
distribution. The multinomial distribution is defined in Section 2.3.2.
With the notation mentioned above and 𝛼0 as the sum of all parameters, we can
calculate the moments of the distribution. The first moment vector has coordi-
nates:
𝛼
E[Xi ] = i .
𝛼0
The covariance matrix has elements:
𝛼 (𝛼 − 𝛼i )
Var(Xi ) = i2 0 ,
𝛼0 (𝛼0 + 1)
and when i ≠ j
−𝛼i 𝛼j
Cov(Xi , Xj ) = .
𝛼02 (𝛼0 + 1)
The covariance matrix is singular (its determinant is zero).
Finally, the univariate marginal distributions are all beta with parameters Xi ∼
Beta(𝛼i , 𝛼0 − 𝛼i ). All these are in the reference (see Balakrishnan and Nevzorov
2004).
Please refer to Lin (2016) for the proof of the properties of the Dirichlet distri-
bution.
2.3 Random Variables and Distribution Functions 17
2.4 Problems
1 If A and B are two matrices, prove the following properties of the trace of a
matrix.
(a) tr(AB) = tr(BA).
(b) tr(A + B) = tr(A) + tr(B).
(c) tr(cA) = ctr(A), for a any constant c.
2 If A and B are two matrices, prove the following properties of the determinant
of a matrix.
(a) det A = det AT .
(b) det (AB) = det A⋅ det A = det (BA).
3 Let ( ) ( )
1 4 8 2 4 −3
A= , B= .
0 4 9 1 8 9
(a) Find A + B.
(b) Find A − B.
(c) Find A′ A.
(d) Find AA′ .
4 Let
( ) ⎛ 2 4⎞
1 4 8
A= , B = ⎜ 1 8⎟ .
0 4 9 ⎜ ⎟
⎝−3 9⎠
5 Let
( ) ⎛ 0 4⎞
5 4 8
A= , B = ⎜ 1 3⎟ .
0 4 3 ⎜ ⎟
⎝−3 2⎠
Families
Agnostidae (p. 244).
Shumardiidae (p. 245).
Trinucleidae (p. 245).
Harpedidae (p. 245).
Paradoxidae (p. 246).
Conocephalidae = Conocoryphidae (p. 247).
Olenidae (p. 247).
Calymenidae (p. 247).
Asaphidae (p. 249).
Bronteidae (p. 249).
Phacopidae (p. 249).
Cheiruridae (p. 250).
Proëtidae (p. 251).
Encrinuridae (p. 251).
Acidaspidae (p. 251).
Lichadidae (p. 252).
ARACHNIDA (p. 255).
Orders.
Families.
Decolopodidae (p. 531).
Colossendeidae = Pasithoidae (p. 532).
Eurycididae = Ascorhynchidae (p. 533).
Ammotheidae (p. 534).
Rhynchothoracidae (p. 535)
Nymphonidae (p. 536).
Pallenidae (p. 537).
Phoxichilidiidae (p. 538).
Phoxichilidae (p. 539).
Pycnogonidae (p. 539).
CRUSTACEA
CHAPTER II
BY
The Crustacea are almost exclusively aquatic animals, and they play a
part in the waters of the world closely parallel to that which insects play
on land. The majority are free-living, and gain their sustenance either as
vegetable-feeders or by preying upon other animals, but a great number
are scavengers, picking clean the carcasses and refuse that litter the
ocean, just as maggots and other insects rid the land of its dead cumber.
Similar to insects also is the great abundance of individuals which
represent many of the species, especially in the colder seas, and the
naturalist in the Arctic or Antarctic oceans has learnt to hang the
carcasses of bears and seals over the side of the boat for a few days in
order to have them picked absolutely clean by shoals of small Amphipods.
It is said that these creatures, when crowded sufficiently, will even attack
living fishes, and by sheer press of numbers impede their escape and
devour them alive. Equally surprising are the shoals of minute Copepods
which may discolour the ocean for many miles, an appearance well known
to fishermen, who take profitable toll of the fishes that follow in their
wake. Despite this massing together we look in vain for any elaborate
social economy, or for the development of complex instincts among
Crustacea, such as excite our admiration in many insects, and though
many a crab or lobster is sufficiently uncanny in appearance to suggest
unearthly wisdom, he keeps his intelligence rigidly to himself, encased in
the impenetrable reserve of his armour and vindicated by the most
powerful of pincers. It is chiefly in the variety of structure and in the
multifarious phases of life-history that the interest of the Crustacea lies.
Before entering into an examination of these matters, it will be well to
take a general survey of Crustacean organisation, to consider the plan on
which these animals are built, and the probable relation of this plan to
others met with in the animal kingdom.
The Crustacea, to begin with, are a Class of the enormous Phylum
Arthropoda, animals with metamerically segmented bodies and usually
with externally jointed limbs. Their bodies are thus composed of a series
of repeated segments, which are on the whole similar to one another,
though particular segments may be differentiated in various respects for
the performance of different functions. This segmentation is apparent
externally, the surface of a Crustacean being divided typically into a
number of hard chitinous rings, some of which may be fused rigidly
together, as in the carapace of the crabs, or else articulated loosely.
Each segment bears typically a pair of jointed limbs, and though they
vary greatly in accordance with the special functions for which they are
employed, and may even be absent from certain segments, they may yet
be reduced to a common plan and were, no doubt, originally present on
all the segments.
Passing from the exterior to the interior of the body we find, generally
speaking, that the chief system of organs which exhibits a similar
repetition, or metameric segmentation, is the nervous system. This
system is composed ideally of a nervous ganglion situated in each
segment and giving off peripheral nerves, the several ganglia being
connected together by a longitudinal cord. This ideal arrangement,
though apparent during the embryonic development, becomes obscured
to some extent in the adult owing to the concentration or fusion of ganglia
in various parts of the body. The other internal organs do not show any
clear signs of segmentation, either in the embryo or in the adult; the
alimentary canal and its various diverticula lie in an unsegmented body-
cavity, and are bathed in the blood which courses through a system of
narrow canals and irregular spaces which surround all the organs of the
body. A single pair, or at most two pairs of kidneys are present.
The type of segmentation exhibited by the Crustacea is thus of a limited
character, concerning merely the external skin with its appendages, and
the nervous system, and not touching any of the other internal organs.[1]
In this respect the Crustacea agree with all the other Arthropods, in the
adults of which the segmentation is confined to the exterior and to the
nervous system, and does not extend to the body-cavity and its contained
organs; and for the same reason they differ essentially from all other
metamerically segmented animals, e.g. Annelids, in which the
segmentation not only affects the exterior and the nervous system, but
especially applies to the body-cavity, the musculature, the renal, and often
the generative organs. The Crustacea also resemble the other Arthropoda
in the fact that the body-cavity contains blood, and is therefore a
“haemocoel,” while in the Annelids and Vertebrates the segmented body-
cavity is distinct from the vascular system, and constitutes a true
“coelom.” To this important distinction, and to its especial application to
the Crustacea, we will return, but first we may consider more narrowly
the segmentation of the Crustacea and its main types of variation
within the group. In order to determine the number of segments which
compose any particular Crustacean we have clearly two criteria: first, the
rings or somites of which the body is composed, and to each of which a
pair of limbs must be originally ascribed; and, second, the nervous
ganglia.
Around and behind the region of the mouth there is very little difficulty
in determining the segments of the body, if we allow embryology to assist
anatomy, but in front of the mouth the matter is not so easy.
In the Crustacea the moot point is whether we consider the paired eyes
and first pair of antennae as true appendages belonging to two true
segments, or whether they are structures sui generis, not homologous to
the other limbs. With regard to the first antennae we are probably safe in
assigning them to a true body-segment, since in some of the
Entomostraca, e.g. Apus, the nerves which supply them spring, not from
the brain as in more highly specialised forms, but from the commissures
which pass round the oesophagus to connect the dorsally lying brain to
the ventral nerve-cord. The paired eyes are always innervated from the
brain, but the brain, or at least part of it, is very probably formed of
paired trunk-ganglia which have fused into a common cerebral mass; and
the fact that under certain circumstances the stalked eye of Decapods
when excised with its peripheral ganglion[2] can regenerate in the form of
an antenna, is perhaps evidence that the lateral eyes are borne on what
were once a pair of true appendages.
Now, with regard to the segmentation of the body, the Crustacea fall
into three categories: the Entomostraca, in which the number of segments
is indefinite; the Malacostraca, in which we may count nineteen
segments, exclusive of the terminal piece or telson and omitting the
lateral eyes; and the Leptostraca, including the single recent genus
Nebalia, in which the segmentation of head and thorax agrees exactly
with that of the Malacostraca, but in the abdomen there are two
additional segments.
It has been usually held that the indefinite number of segments
characteristic of the Entomostraca, and especially the indefinitely large
number of segments characteristic of such Phyllopods as Apus, preserves
the ancestral condition from which the definite number found in the
Malacostraca has been derived; but recently it has been clearly pointed
out by Professor Carpenter[3] that the number of segments found in the
Malacostraca and Leptostraca corresponds with extraordinary exactitude
to the number determined as typical in all the other orders of Arthropoda.
This remarkable correspondence (it can hardly be coincidence) seems to
point to a common Arthropodan plan of segmentation, lying at the very
root of the phyletic tree; and if this is so, we are forced to the conclusion
that the Malacostraca have retained the primitive type of segmentation in
far greater perfection than the Entomostraca, in some of which many
segments have been added, e.g. Phyllopoda, while in others segments
have been suppressed, e.g. Cladocera, Ostracoda. It may be objected to
this view of the primitive condition of segmentation in the Crustacea that
the Trilobites, which for various reasons are regarded as related to the
ancestral Crustaceans, exhibit an indefinite and often very high number
of segments; but, as Professor Carpenter has pointed out, the oldest and
most primitive of Trilobites, such as Olenellus, possessed few segments
which increase as we pass from Cambrian to Carboniferous genera.
The following table shows the segmentation of the body in the
Malacostraca, as compared with that of Limulus (cf. p. 263), Insecta, the
primitive Myriapod Scolopendrella, and Peripatus. It will be seen that the
correspondence, though not exact, is very close, especially in the first four
columns, the number of segments in Peripatus being very variable in the
different species.
Table showing the Segmentation of various Arthropods
Malacostraca. Limulus. Insecta. Myriapoda. Peripatus.
(Scolopendrella).
1 Eyes Median Eyes
eyes
2 1st antennae Rostrum Antennae Feelers Feelers
3 2nd antennae Chelicerae Intercalary
segment
4 Mandibles Pedipalpi Mandibles Mandibles Mandibles
5 1st maxillae 1st walking Maxillulae Maxillulae 1st jaw-claw
legs
6 2nd maxillae 2nd „ „ 1st 1st maxillae 2nd jaw-
maxillae claw
7 1st maxillipede 3rd „ „ 2nd 2nd maxillae 1st leg
maxillae
8 2nd maxillipede 4th „ „ 1st leg 1st leg 2nd „
9 3rd maxillipede Chilaria 2nd „ 2nd „ 3rd „
10 1st ambulatory Genital 3rd „ 3rd „ 4th „
operculum
11 2nd „ 1st gill- 1st 4th „ 5th „
book abdominal
12 3rd „ 2nd „ 2nd „ 6th „ 6th „
13 4th „ 3rd „ 3rd „ 6th „ 7th „
14 5th „ 4th „ 4th „ 7th „ 8th „
15 1st abdominal 5th „ 5th „ 8th „ 9th „
16 2nd „ No 6th „ 9th „ 10th „
appendages
17 3rd „ „ 7th „ 10th „ 11th „
18 4th „ „ 8th „ 11th „ 12th „
19 5th „ „ 9th „ 12th „ 13th „
20 6th „ „ 10th „ Reduced limbs 14th „
21 [4] „ Cercopods [5]
The essential fact that the two types of limb are built on the same plan
may be considered as established; but it may be urged that the biramous
type represents this common plan more nearly than the foliaceous. It is,
at any rate, certain that in the maxillipedes of the Decapoda we witness
the conversion of the biramous type into the foliaceous by the expansion
of the basal joints concomitantly with the assumption by the maxillipedes
of masticatory functions. Thus in the Decapoda the first maxillipede is
decidedly foliaceous owing to the expanded “gnathobases” (Fig. 1, A, bp,
cxp), and the second maxillipedes are flattened, with their basal joints
somewhat expanded and furnished with biting hairs; but in the
“Schizopoda” (e.g. Mysis) the first maxillipede is a typical biramous limb,
though the expanded gnathobases in some forms are beginning to project
(Fig. 1, E), while the limb following, which corresponds to the second
maxillipede of Decapods, is simply a biramous swimming leg. Besides this
obvious conversion of a biramous into a foliaceous limb, further evidence
of the fundamental character of the biramous type is found, first, in its
invariable occurrence in the Nauplius stage, which does not necessarily
mean that the ancestors of the Crustacea possessed this type of limb in
the adult, but which does imply that this type of limb was possessed at
some period of life by the common ancestral Crustacean; and, second, the
limbs of the Trilobita, a group which probably stands near the origin of
the Crustacea, have been shown by Beecher to conform to the biramous
type (Fig. 1, H). Furthermore, the thoracic limbs of Nebalia, an animal
which combines many of the characteristics of Entomostraca and
Malacostraca, and is therefore considered as a primitive type, despite
their flattened character, are really built upon a biramous plan (Fig. 1, G).
In conclusion, we may point out that this view of the Crustacean limb,
as essentially a biramous structure, agrees with the conclusion derived
from our consideration of the segmentation of the body, and points less to
the Branchiopoda as primitive Crustacea and more to some generalised
Malacostracan type.
So far we have shortly dealt with those systems of organs which are
clearly affected by the metameric segmentation of the body; we must now
expose the condition of the body-cavity to a similar scrutiny. If we
remove the external integument of a Crustacean, we find that the internal
organs do not lie in a spacious and discrete body-cavity, as is the case in
the Annelids and Vertebrates, but that they are packed together in an
irregular system of spaces (“haemocoel”) in communication with the
vascular system and containing blood. In the Entomostraca and smaller
forms generally, a definite vascular system hardly exists, though a central
heart and artery may serve to propel the blood through the irregular
lacunae of the body-cavity; but in the larger Malacostraca a complicated
system of arteries may be present which pour the blood into fairly
definitely arranged spaces surrounding the chief organs. These spaces
return the blood to the pericardium, and so to the heart again through the
apertures or ostia which pierce its walls.
This condition of the body-cavity or haemocoel is reproduced in the
adults of all Arthropods, but in some of them by following the
development we can trace the steps by which the true coelom is replaced
by the haemocoel. In the embryos of all Arthropods except the Crustacea,
a true closed metamerically segmented coelom is formed as a split in the
mesodermal embryonic layer of cells, distinct from the vascular system.
During the course of development the segmented coelomic spaces and
their walls give rise to the reproductive organs and to certain renal organs
in Peripatus, Myriapoda, and Arachnida (nephridia and coxal glands),
but the general body-cavity is formed as an extension of the vascular
system, which is laid down outside the coelom by a canaliculisation of the
extra-coelomic mesoderm. In the embryos of the Crustacea, however,
there is never at any time a closed segmented coelom, and in this respect
the Crustacea differ from all other Arthropods. The only clear instance in
which metamerically repeated mesodermal cavities have been seen in the
embryo Crustacean is that of Astacus; here Reichenbach[7] states that in
the abdomen segmental cavities are formed which subsequently break
down; but even in this instance no connexion has been shown to subsist
between these embryonic cavities and the reproductive and excretory
organs of the adult.
Since the connexion between the coelom and the excretory organs is
always a very close one throughout the animal kingdom, interest naturally
centres upon the renal organs in Crustacea, and it has been suggested
that these organs in Crustacea represent the sole remains, with the
possible exception of the gonads, of the coelom. Since, at any rate, a part
of the kidneys appears to be developed as a closed sac in the mesoderm,
and since they possess a possible segmental value, this suggestion is
plausible; but, on the other hand, since there are never more than two
pairs of kidneys, and since they are totally unconnected with the gonads
or with any other indication of a segmented coelom, the suggestion
remains purely hypothetical.
The renal organs of the Crustacea, excluding the Malpighian tubes
present in some Amphipods which open into the alimentary canal, and
resemble the Malpighian tubes of Insects, consist of two pairs—the
antennary gland, opening at the base of the second antenna, and the
maxillary gland, opening on the second maxilla. These two pairs of glands
rarely subsist together in the adult condition, though this is said to be the
case in Nebalia and possibly Mysis; the antennary glands are
characteristic of adult Malacostraca[8] and the larvae of the Entomostraca,
while the maxillary glands (“shell-glands”) are present in adult
Entomostraca and larval Malacostraca, that is to say, the one pair
replaces the other in the two great subdivisions of the Crustacea. The
shell-gland of the Entomostraca is a simple structure consisting of a
coiled tube opening to the exterior on the external branch of the second
maxilla, and ending blindly in a dilated vesicle, the end-sac. The
antennary gland of the Malacostraca is usually more complicated: these
complications have been studied especially by Weldon,[9] Allen, and
Marchal[10] in the Decapoda. In a number of forms we have a tube opening
to the exterior at the base of the second antenna, and expanding within to
form a spacious bladder into which the coiled tubular part of the kidney
opens, while at the extremity of this coiled portion is the vesicle called the
end-sac. This arrangement may be modified; thus in Palaemon Weldon
described the two glands as fusing together above and below the
oesophagus, the dorsal commissure expanding into a huge sac stretching
dorsally down the length of the body. This closed sac with excretory
functions thus comes to resemble a coelomic cavity, and the view that it is
really coelomic has indeed been upheld.
A modified form of this view is that of Vejdovský, who describes a
funnel-apparatus leading from the coiled tube into the end-sac of the
antennary gland of Amphipods; he regards the end-sac alone as
representing the coelom, while the funnel and coiled tube represent the
kidney opening into it.
Not very much is known of the development of these various structures.
Some authors have considered that both antennary and maxillary glands
are developed in the embryo from ectodermal inpushings, but the more
recent observations of Waite[11] on Homarus americanus indicate that the
antennary gland at any rate is a composite structure, formed by an
ectodermal ingrowth which meets a mesodermal strand, and from the
latter are produced the end-sac and perhaps the tubular excretory
portions of the gland with their derivatives.
With regard to the possible metameric repetition of the renal organs, it
is of interest to note that by feeding Mysis and Nebalia on carmine,
excretory glands of a simple character were observed by Metschnikoff
situated at the bases of the thoracic limbs.
The alimentary canal of the Crustacea is a straight tube composed of
three parts—a mid-gut derived from the endoderm of the embryo, and a
fore- and hind-gut formed by ectodermal invaginations in the embryo
which push into and fuse with the endodermal canal. The regions of the
fore- and hind-gut can be recognised in the adult by the fact of their being
lined with the chitinous investment which is continued over the external
surface of the body forming the hard exoskeleton, while the mid-gut is
naked. The chitinous lining of fore- and hind-gut is shed whenever the
animal moults. In the Malacostraca, in which a complicated “gastric mill”
may be present, the chitinous lining of this part of the gut is thrown into
ridges bearing teeth, and this stomach in the crabs and lobsters reaches a
high degree of complication and materially assists the mastication of the
food. The gut is furnished with a number of secretory and metabolic
glands; the so-called liver, which is probably a hepatopancreas, opening
into the anterior end of the mid-gut, is directed forwards in most
Entomostraca and backwards in the Malacostraca, in the Decapoda
developing into a complicated branching organ which fills a large part of
the thorax. In the Decapoda peculiar vermiform caeca of doubtful
function are present, a pair of which open into the gut anteriorly where
fore-passes into mid-gut, and a single asymmetrically placed caecum
opens posteriorly into the alimentary tract where mid- passes into hind-
gut.
The disposition of these caeca, marking as they do the morphological
position of fore-, mid-, and hind-gut, is of peculiar interest owing to the
variations exhibited. From some unpublished drawings of Mr. E. H.
Schuster, which he kindly lent me, it appears that in certain Decapods,
e.g. Callianassa subterranea, the length of the mid-gut between the
anterior and posterior caeca is very long; in Carcinus maenas it is
considerable; in Maia squinado it is greatly reduced, the caeca being
closely approximated; while in Galathea strigosa the caeca are greatly
reduced, and the mid-gut as a separate entity has almost disappeared.
The relation of these variations to the habits of the different crabs and to
their modes of development is unknown.
The reproductive organs usually make their appearance as a small
paired group of mesodermal cells in the thorax comparatively late in life;
and neither in their early development nor in the adult condition do they
show any clear signs of segmentation or any connexion with a coelomic
cavity. The sexes are usually separate, but hermaphroditism occurs
sporadically in many forms, and as a normal condition in some parasitic
groups (see pp. 105–107). The adult gonads are generally simple paired
tubes, from the walls of which the germ-cells are produced, and as these
grow and come to maturity they fill up the cavities of the tubes; special
nutrient cells are rarely differentiated, though in some cases (e.g.
Cladocera) a few ova nourish themselves by devouring their sister-cells
(see p. 44). The oviducts and vasa deferentia are formed as simple
outgrowths from the gonadial tubes, which acquire an opening to the
exterior; they are usually poorly supplied with accessory glands, the
epithelium of the canals often supplying albuminous secretions for
cementing the eggs together, while the lining of the vasa deferentia may
be instrumental in the formation of spermatophores for transferring large
packets of spermatozoa to the female. In the vast majority of Crustacea
copulation takes place, the male passing spermatophores or free
spermatozoa into special receptacles (spermathecae), or into the oviducts
of the female. The spermatophores are hollow chitinous structures in