Assign
Assign
Assign
Ans : Data is nothing but pieces of information that can be organized, manipulated and stored .
Essentially Data is classified into two categories i.e Quantitative and Qualitative Data .
In simple words Quantitative data talks about the numerical data i.e no.of pens, no.od cars,
heights weights and so on. We can perform arithmetic calculations.
Qualitative data talks about non-numerical data i.e gender, location, rank and etc ..
We can measure or scaling the data in four different ways such as Nominal, Ordinal, Interval and
Rational .
Nominal scale data: It doesn’t have a specific order I mean we cant say which observation should
be organized first . Ex: Gender, Region
Ordinal Scaling Data : It has order , Ex: rank, position, level and etc
Interval Scaling Data: The starting position of the data need not be a ‘zero’. Ex: Temperature ,
Time , score and rank etc.
Ratio Scale Date: The starting position can be set as ‘zero’ and can compare the ratio’s .
2) What are the measures of central tendency, and when should you use each? Discuss the mean,
median, and mode with examples and situations where each is appropriate?
Ans: Central Tendency is noting but representing the data in one datapoint.
Mean: Is the average of a dataset, we can understand that most of the data spread around and has
almost same characteristics of the Mean of the dataset, mean is a middle value of a dataset.
Usage: Where Data without outliers we can use Mean because it is very sensitive to Outliers.
Median: Median is nothing but middle most observation/value of a Dataset . We can find out the
Median after arranging the data in an order.
Mode: In simple words, Mode is the value/observation which is repeated most of the times In the
dataset.
Usage: Where we need to find out the most common value in the dataset.
3)Explain the concept of dispersion. How do variance and standard deviation measure the spread of
data?
Dispersion is a statistical term that describes the spread or variability of a set of data points. It
indicates how much the data points deviate from the mean (average) value.
Variance measures the average degree to which each point differs from the mean. It is
calculated by taking the average of the squared differences from the mean.
Standard Deviation is the square root of the variance and provides a measure of the
dispersion in the same units as the data itself.
4)What is a box plot, and what can it tell you about the distribution of data?
A box plot (or Whisker plot) is a graphical representation that shows data distribution based on a
five-point summary: minimum, first quartile (25%), median (50%), third quartile (75%), and
maximum. It displays the central tendency, variability, and skewness of the data. Common box plot
features include:
Ans: Random sampling is a method of selecting a subset of individuals from a population to estimate
characteristics of the entire population. It ensures each member of the population has an equal
chance of being selected, which helps eliminate bias and allows for more reliable and generalizable
inferences.
6)Explain the concept of skewness and its types. How does skewness affect the interpretation of
data?
Skewness affects data interpretation as it identifies deviations from a normal distribution. High
skewness can impact the mean and median.
7)What is the interquartile range (IQR), and how is it used to detect outliers?
Ans: The IQR is the range between the first quartile (Q1) and third quartile (Q3). It measures the
middle 50% of a dataset.
IQR=Q3−Q1
Outliers: Points that lie below Q1 - 1.5IQR or above Q3 + 1.5IQR are considered outliers.
9)Explain the properties of the normal distribution and the empirical rule (68-95-99.7 rule).
Ans: The normal distribution is a bell-shaped curve where most data points cluster around the mean.
Properties include:
10)Provide a real-life example of a Poisson process and calculate the probability for a specific event .
Ans: A Poisson process models events occurring randomly over a fixed interval. For instance, the
number of cars passing through a toll booth per hour.
11)Explain what a random variable is and differentiate between discrete and continuous random
variables.
Ans: A random variable is a variable whose values depend on outcomes of a random phenomenon.
Discrete random variable: Takes on a finite number of distinct values (e.g., number of heads
in coin tosses).
Continuous random variable: Takes on an infinite number of possible values (e.g., weight,
height).
12)Provide an example dataset, calculate both covariance and correlation, and interpret the results .