Joint Distribution PDF
Joint Distribution PDF
Joint Distribution PDF
f ( x, y ) = 1.
x
(1)
If this were not so, one would end up having probability of an event being a negative number or the
total probability would not be one.
Example 1: To illustrate, let us take X to be the no. of LCD televisions sold by an electronics store in a
day, while Y denotes the no. of local daily newspapers which carried an advertisement from the store on
that day. The following table provides a possible joint density function:
Example 1: Joint density between No. of LCD TVs sold (X) and No. of newspapers carrying store AD (Y)
0
No. of local daily
0
newspapers carrying
1
store AD (Y)
2
Marginal density of X
0.25
0.05
0.01
0.31
0.05
0.15
0.03
0.23
0.02
0.06
0.06
0.14
Marginal
density of Y
0.01
0.04
0.05
0.1
0.43
0.4
0.17
1
Thus, there is a 10% chance that on a day no local daily carries the store advertisement and the store
sells 1 LCD television. The chance that at most 1 daily newspaper carries the store advertisement and
the store sells more than 2 LCD televisions can be computed as: 0.02+0.01+0.06+0.04=0.13.
x1 y1
f ( x, y ) dy dx = 1.
Example 2: Consider
sider an young individual w
who is still living
ving with her parents and has recently got a job;
the only money she spends is on (a) entertainment (eating out, movies etc ) (b) shopping.
shopping Let X and Y
denote the percentage of her salary she spends on the two above categories on a random month. She
is careful not to overspend, and hence X + Y 1. She believes that, on a random month, (X,Y) is
`equally likely to be anywheree on the triangle shown below --- i.e. an uniform distribution over this
triangle describes the random patter
pattern of percentages spent on the two categories.
Or,, equivalently in the functional form, the joint density function is given by:
2 x 0, y 0, x+y 1
f(x,y) =
otherwise
0
So, in particular, the chance she spends less than 25% of her salary on entertainment and she spends
less than 50% of her salaryy on shopping is
is:
0.25 0.5
P(X<0.25,Y<0.5)=
f X ( x) =
f ( x, y ) dy , (continuous case)
(2a)
(2b)
In the present context, such a density function is called the marginal density of X. Similarly, the marginal
density of Y is given by:
fY ( y ) =
f ( x, y ) dx, or
f ( x, y )
depending on continuous case or discrete case. In the discrete case, as in Example 1 in the previous
page, the univariate densities (probabilities) are obtained as row/column total of probabilities
associated with the joint distribution; these are usually reported in the margins --- hence the name
marginal probability distributions (densities). At times, they are also called row marginals and column
marginals, as would be obvious and appropriate in a given context.
In Example 2, the marginal density of Y can be obtained as using (2a):
0
otherwise
Since the joint density is symmetric in the two arguments, you can conclude the marginal density of X is
also the same as the marginal density of Y.
Conditional Density:
Conditional densities (distributions) are the other forms of densities which are of interest in a
bivariate context. Thus, for example, the conditional density of X given Y=y is denoted and defined as
follows:
f X |Y = y ( x | y ) =
f ( x, y )
fY ( y )
(3)
When the variables are discrete, the above is consistent with the notions of conditional
probability. In the setting of Example 1, the conditional density of the number of LCD televisions sold by
the store on a random day when exactly 1 of the local daily newspaper is carrying an advertisement of
the store is given by:
values
density
4
0.04/0.4=0.1
In the context of Example 2, given that on a random month she spends 40% of her salary on
entertainment, percentage of her salary she spends on shopping would be Uniformly distributed on
(0,0.6). Since the joint distribution is uniform, you can see this visually from the diagram given in
Example 2; more generally/formally, this follows by computing the density to be:
fY | X = 0.4 ( y | 0.4) =
Conditional densities have the usual properties of univariate densities. So you can compute the
expected value, variance (or any other characteristics like percentiles, IQR) and interpret them in usual
manner. Thus, in the context of Example 1, the expected number of LCD televisions sold on a day when
1 local newspaper is carrying an advertisement is given by
0*0.125+1*0.25+2*0.375+3*0.15+4*0.1=1.85.
In the context of Example 2, for example among other properties, the inter-quartile range of percentage
of her salary she spends on shopping, given that she spends 40% of her salary on entertainment can be
found as:
0.45 0.15 = 0.3 .
INDEPENDENT Random Variables:
If X and Y are independent random variables, then the conditional distribution of X given Y
should be the (unconditional /marginal) distribution of X, i.e. we must have:
f X |Y = y ( x | y ) = f X ( x ),
for any x, y.
f ( x, y ) = f X ( x) fY ( y ),
for any x, y.
(4)
Indeed, usually (4) is taken as the definition for X and Y to be independent of each other.
From now onwards, for the remaining part of this section, we will write only the representations
corresponding to the continuous case, with the obvious understanding that if the variables are discrete,
then summation will replace integrals in the expressions provided.
E[ g ( X , Y )] =
g ( x, y) f ( x, y )dy dx.
(5)
In the above, we have taken the possible values of X and Y to be between minus infinity and infinity. If
in a given case, the ranges are finite, the density function would be zero outside this range, and
accordingly, the integrals need to be computed over the finite range only.
Thus, taking g ( x, y ) = xy or g(x,y)=x + y for example, we have:
E ( XY ) =
(6)
E( X + Y ) =
( x + y ) f ( x, y)dy dx.
(7)
Of course, the mean and variance of X can be computed either from the marginal density of X or from
the joint density function of X and Y as could be convenient from case to case:
E( X ) =
xf ( x, y)dy dx = xf
( x) dx; E (Y ) =
yf ( x, y )dy dx = yf
( y ) dy.
(8)
where X = E ( X ) and Y = E (Y ). Note that the similarity in the definition with variance of a random
variables. If the variables are positively associated higher (lower) than average values of X would
(generally) tend to be associated with higher (lower) than average values of Y, leading to XY being
positive. On the other hand, if the valuables are negatively associated, higher (lower) than average
values of X would tend to be associated with lower (higher) than average values of Y, more often than
not, resulting in XY being negative. Consequently, the covariance would be negative or positive,
depending on whether the assertion pattern is negative or positive. An alternative equivalent form of
(8) for the covariance coefficient is useful in its evaluation, quite often:
XY = E ( XY ) X Y .
(9)
Unfortunately, the magnitude of the covariance does not indicate the strength of dependency; it
depends on the values of individual variances. The Cauchy-Schwarz inequality, a celebrated result in
probability theory, guarantees that the absolute value of covariance coefficient will always be less than
the product of the standard deviations of the random variables. Thus, to get an idea about the strength
of dependency, one needs to consider the standardised form of covariance; this is known as correlation
coefficient:
Correlation( X , Y ) = XY =
XY
.
X Y
(10)
By the Cauchy-Schwarz inequality, the correlation coefficient always lies between -1 and 1. If it is
exactly equal to 1 or -1, the variables are said to be perfectly correlated.
On the other hand, if the correlation coefficient is zero, we say that the variables are
uncorrelated. Let us now understand the similarity as well as difference between uncorrelated variables
and independent variance, If X and Y are independent of each other, you should be able to see from (4)
and (6) that:
E ( XY ) = E ( X ) E (Y ) = X Y
XY = 0 XY = 0.
In other words, independent variables are necessarily uncorrelated. However, the converse is
NOT true. To see this, consider a random variable X which has an Uniform distribution on (-1,1). Now
consider Y to be nothing but X 2 . You should be able to see that:
E ( X ) = 0, and so is E ( XY ) = E ( X 3 ) XY = 0 XY = 0.
or, X and Y are uncorrelated. However, obviously X and Y are very much dependent on each other (You
would know the exact value of Y if somebody told you the value of X). Often because of this, we say,
correlation is (only) a measure of linear dependency; this would become clearer when we explore the
regression models, later on.
1
3
X = E ( X ) = 2 x(1 x)dx = ;
0
Now,
E ( X 2 ) = 2 x 2 (1 x)dx =
0
1
1 1
1
= E (Y 2 ) X2 = 2 = = Y2 .
6
6 3 18
1 1 x
and hence
E ( XY ) =
2 xy dy dx = 12 Cov( X , Y ) =
XY
0 0
leading to:
XY
1 1 1
1
= ,
12 3 3
36
1
XY
36 = 0.5 .
=
=
1
X Y
18
Can you intuitively see why the correlation had to be negative in this example?
E[aX + bY ] = a X + bY
(11)
You may note that (11) holds true irrespective of whether X and Y are independent or otherwise. On the
other hand, the variance is given by:
Var[ aX + bY ] = a 2 X2 + b 2 Y2 +2ab XY
(12)
Naturally, the last term in (12) vanishes when X and Y are independent or even uncorrelated random
variables.
Going back to the specific case, given above, the investor may want to split amount of
investment into w1 and w2 for investing respectively into S1 and S2: stocks which presumably have high
1 and 2 . As remarked earlier, this would typically result, to the disappointment of him, in high 1 and
2 , as well. The return from his investment portfolio will be the random variable w1 X 1 + w2 X 2 having
an expected return of
E ( w1 X 1 + w2 X 2 ) = w11 + w2 2 ,
(13)
{Var (w1 X 1 + w2 X 2 )}
1/ 2
(14)
where denotes the correlation between returns from S1 and S2. Thus, the investor would want to
maximize (13) and minimize (14), through his selection of weights w1 and w2 , as well as stocks, S1 and S2.
Of course, there are constraints on weights w1 and w2 being nonnegative and adding up to 1 (or prespecified amount). In an optimization course, you will learn how to achieve this optimization by trying to
maximize the expected return of the portfolio subject to the standard deviation of return of the
portfolio being below * , a pre-specified acceptable level of uncertainty for the investor; or by
minimizing the standard deviation of return of the portfolio subject to the expected return of the
portfolio being above * , a pre-specified acceptable level of return for the investor.
However, let us now discuss the direction of the possible solution from probabilistic perspective.
Observe that, other factors like individual variability of return remaining the same, (14) would be higher
if , the correlation between return is positive and lower if is negative. What does it imply for the
investor? He should be looking to split his investment into stocks which yield high expected return with
negative correlation between them. Generally return from stocks of the same sector yield positive
correlation while it may be possible to select high-yielding stocks from two different sectors. This is
indeed the basic principle behind portfolio diversification.
We have illustrated the above example with only 2 stocks. Obviously, the principle is valid more
generally, and can be easily extended for any number of stocks. Thus, for example, if a particular
investment portfolio has w1 , w2 , wk amounts being invested into stocks, S1, S2,, , Sk, where the return
from stock , Si has an expected value i and standard deviation of i , with covariance between returns
from Si and Sj being ij , (13) and (14) can be generalized to obtain the expected return and variance of
the portfolio as:
k
k
k
k
k
E wi X i = wi i , Var wi X i = wi2 i2 +
i =1
i =1
i =1
i =1
i =1
j =1
i
wi w j ij
If you are familiar with matrix algebra, you may find it convenient to express the above as product of
vectors matrices. At any rate, to get a closer understanding of the portfolio diversification principle (and
the illustration of computation of expected value and variance of a linear combination of random
variables) you may wish to carry out the optimization (along the lines indicated in the last paragraph)
using the SOLVER routine in MS-EXCEL, for a bunch of stocks with mean, variance-covariance of returns
are possibly estimated from a past data.
2. Soon after their marriage, Haquib and Waheeda bought a flat on the Bannerghatta Road
between IIMB and Bannerghatta National Park. Both of them are software engineers, but their
offices are far apart. Hence they have to drive different vehicles to their respective office.
However, they made sure that they would leave the home exactly at 8:15 A.M. in the morning
and almost regularly they were back at home within an hour of each other. Waheeda receives a
L mark in her office if she reaches after 9A.M. while Haquib can avoid similar L mark as long
as he reaches before 9:10 A.M. The time taken, in minutes, to reach the office from their home
by Haquib (H) and the same for Waheeda (W) are random variables having the following joint
probability density function (pdf):
(h 45)(50 w)
if
5000
(50 w)
f H ,W (h, w) =
if
500
3. In the computer exercise assigned to you, apply the portfolio diversification principle to
determine your optimal portfolio.