R Lab - Probability Distributions
R Lab - Probability Distributions
R Lab - Probability Distributions
data <- data.frame(outcome = 0:5, probs = c(0.1, 0.2, 0.3, 0.2, 0.1, 0.1))
In this exercise, we will simulate some random normally distributed data using
the rnorm() function. This data is contained within the data vector. You will then need to
visualize the data.
# simulating data
set.seed(11225)
help(dnorm)
# calculate the expected probability value and assign it to the variable expected_score
expected_score
Summary statistics: Variance and the
standard deviation
In addition to the mean, sometimes you would also like to know about the spread of the
distribution. The variance is often taken as a measure of spread of a distribution. It is
the squared deviation of an observation from its mean. If you want to calculate it on the
basis of a probability distribution, it is the sum of the squared difference between the
individual observation and their mean multiplied by their probabilities. See the following
formula: var(X)=∑(xi−x¯)2∗Pi(xi)var(X)=∑(xi−x¯)2∗Pi(xi).
If we want to turn that variance into the standard deviation, all we need to do is to take
its square root. Let's go back to our probability mass function of the first exercise and
see if we can get the variance.
Look at the visualization; what is the probability that an observation from a normal
distribution is between 1 standard deviation below the mean and 2 standard deviations
above the mean?
0.815
In the visualization, we are given a blue area with a probability of 0.2. We however want
to know the value that is associated with the yellow dotted vertical line. This value is the
0.2 quantile (=20th percentile) and divides the curve in an area that contains the lower
20% of the scores and an area that the rest of the scores. If our variable is normally
distributed, in R we can use the function qnorm() to do so. We can specify the
probability as the first parameter, then specify the mean and then specify the standard
deviation, for example, qnorm(0.2, mean = 25, sd = 5).
Zi=xi−x¯sxZi=xi−x¯sx
The Z-score represents how many standard deviations from the mean a value lies.
z_value = 2.6
Other ingredients that are essential to a binomial distribution is that we need to observe
a certain number of trials, let's call this n, and we count the number of successes in
which we are interested, let's call this x. Useful summary statistics for a binomial
distribution are the same as for the normal distribution: the mean and the standard
deviation.
The mean is calculated by multiplying the number of trials n by the probability of a
success denoted by p. The standard deviation of a binomial distribution is calculated by
the following formula: n∗p∗(1−p)−−−−−−−−−−−√n∗p∗(1−p).