Nothing Special   »   [go: up one dir, main page]

Skewness: Difference between revisions

Content deleted Content added
Citations: - Updated a citation to a live link
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Citation needed}}
 
(19 intermediate revisions by 13 users not shown)
Line 9:
In [[probability theory]] and [[statistics]], '''skewness''' is a measure of the asymmetry of the [[probability distribution]] of a [[real number|real]]-valued [[random variable]] about its mean. The skewness value can be positive, zero, negative, or undefined.
 
For a [[unimodal]] distribution (a distribution with a single peak), negative skew commonly indicates that the ''tail'' is on the left side of the distribution, and positive skew indicates that the tail is on the right. In cases where one tail is long but the other tail is fat, skewness does not obey a simple rule. For example, a zero value in skewness means that the tails on both sides of the mean balance out overall; this is the case for a symmetric distribution, but can also be true for an asymmetric distribution where one tail is long and thin, and the other is short but fat. Thus, the judgement on the symmetry of a given distribution by using only its skewness is risky; the distribution shape must be taken into account.
 
== Introduction ==
Consider the two distributions in the figure just below. Within each graph, the values on the right side of the distribution taper differently from the values on the left side. These tapering sides are called ''tails'', and they provide a visual means to determine which of the two kinds of skewness a distribution has:
# ''{{visible anchor|negative skew}}'': The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be ''left-skewed'', ''left-tailed'', or ''skewed to the left'', despite the fact that the curve itself appears to be skewed or leaning to the right; ''left'' instead refers to the left tail being drawn out and, often, the mean being skewed to the left of a typical center of the data. A left-skewed distribution usually appears as a ''right-leaning'' curve.<ref name="cnx.org">{{Cite web |last=Illowsky |first=Barbara |last2=Dean |first2=Susan |date=2020-03-27 |title=2.6 Skewness and the Mean, Median, and Mode - Statistics |url=https://openstax.org/books/statistics/pages/2-6-skewness-and-the-mean-median-and-mode |access-date=2022-12-21 |website=[[OpenStax]] |language=en}}</ref>
# ''{{visible anchor|positive skew}}'': The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be ''right-skewed'', ''right-tailed'', or ''skewed to the right'', despite the fact that the curve itself appears to be skewed or leaning to the left; ''right'' instead refers to the right tail being drawn out and, often, the mean being skewed to the right of a typical center of the data. A right-skewed distribution usually appears as a ''left-leaning'' curve.<ref name="cnx.org" />
 
Line 24:
 
== Relationship of mean and median ==
The skewness is not directly related to the relationship between the mean and median: a distribution with negative skew can have its mean greater than or less than the median, and likewise for positive skew.<ref name="von Hippel 2005">{{cite journal |urllast=von Hippel |first=Paul http://wwwT.amstat.org/publications/jse/v13n2/vonhippel.html |year=2005 |title=Mean, Median, and Skew: Correcting a Textbook Rule |firsturl=Paul Thttp://www.amstat.org/publications/jse/v13n2/vonhippel.html |lasturl-status=vondead Hippel|journal=Journal of Statistics Education| |volume=13 |issue=2 |year archive-url=https://web.archive.org/web/20160220181456/http://www.amstat.org/publications/jse/v13n2/vonhippel.html 2005 |archive-date=2016-02-20}}</ref>
[[File:Relationship between mean and median under different skewness.png|thumb|434x434px|A general relationship of mean and median under differently skewed unimodal distribution.]]
In the older notion of [[nonparametric skew]], defined as <math>(\mu - \nu)/\sigma,</math> where <math>\mu</math> is the [[mean]], <math>\nu</math> is the [[median]], and <math>\sigma</math> is the [[standard deviation]], the skewness is defined in terms of this relationship: positive/right nonparametric skew means the mean is greater than (to the right of) the median, while negative/left nonparametric skew means the mean is less than (to the left of) the median. However, the modern definition of skewness and the traditional nonparametric definition do not always have the same sign: while they agree for some families of distributions, they differ in some of the cases, and conflating them is misleading.
 
Line 37:
 
===Fisher's moment coefficient of skewness===
The skewness <math>\gamma_1</math> of a random variable ''X'' is the third [[standardized moment]] <math>\tilde{\mu}_3</math>, defined as:<ref name="StanBrown1"/><ref name="FXSolver1"/>
 
:<math>
\gamma_1 := \tilde{\mu}_3 = \operatorname{E}\left[\left(\frac{X-\mu}{\sigma}\right)^3 \right]
= \frac{\mu_3}{\sigma^3}
= \frac{\operatorname{E}\left[(X-\mu)^3\right]}{( \operatorname{E}\left[ (X-\mu)^2 \right] )^{3/2}}
Line 48:
The skewness is also sometimes denoted Skew[''X''].
 
If ''σ'' is finite, and ''μ'' is finite too, andthen skewness can be expressed in terms of the non-central moment E[''X''<sup>3</sup>] by expanding the previous formula,:
:<math>
\begin{align}
Line 89:
</math>
 
where <math>\overline{x}</math> is the [[sample mean]], ''s'' is the [[Standard deviation#Corrected sample standard deviation|sample standard deviation]], ''m''<sub>2</sub> is the (biased) sample second central [[Moment (mathematics)|moment]], and ''m''<sub>3</sub> is the (biased) sample third central moment.<ref name=JG/> <math>g_1</math> is a [[Method of moments (statistics)|method of moments]] estimator.
 
Another common definition of the ''sample skewness'' is<ref name=JG/><ref name=Doane2011>Doane, David P., and Lori E. Seward. [http://jse.amstat.org/v19n2/doane.pdf "Measuring skewness: a forgotten statistic."] Journal of Statistics Education 19.2 (2011): 1-18. (Page 7)</ref>
Line 101:
where <math>k_3</math> is the unique symmetric unbiased estimator of the third [[cumulant]] and <math>k_2 = s^2</math> is the symmetric unbiased estimator of the second cumulant (i.e. the [[Variance#Population variance and sample variance|sample variance]]). This adjusted Fisher–Pearson standardized moment coefficient <math> G_1 </math> is the version found in [[Microsoft Excel|Excel]] and several statistical packages including [[Minitab]], [[SAS (software)|SAS]] and [[SPSS]].<ref name=Doane2011/>
 
Under the assumption that the underlying random variable <math>X</math> is normally distributed, it can be shown that all three ratios <math>b_1</math>, <math>g_1</math> and <math>G_1</math> are unbiased and [[Consistent estimator|consistent]] estimators of the population skewness <math>\gamma_1=0</math>, with <math>\sqrt{n} b_1 \mathrel{\xrightarrow{d}} N(0, 6)</math>, i.e., their distributions converge to a normal distribution with mean 0 and variance 6 ([[Ronald Fisher|Fisher]], 1930).<ref name=JG/> The variance of the sample skewness is thus approximately <math>6/n</math> for sufficiently large samples. More precisely, in a random sample of size ''n'' from a normal distribution,<ref name=Duncan1997>Duncan Cramer (1997) Fundamental Statistics for Social Research. Routledge. {{isbn|9780415172042}} (p 85)</ref><ref>Kendall, M.G.; Stuart, A. (1969) ''The Advanced Theory of Statistics, Volume 1: Distribution Theory, 3rd Edition'', Griffin. {{isbn|0-85264-141-9}} (Ex 12.9)</ref>
 
: <math> \operatorname{var}(G_1)= \frac{6n ( n - 1 )}{ ( n - 2 )( n + 1 )( n + 3 ) } .</math>
Line 119:
With pronounced skewness, standard statistical inference procedures such as a [[confidence interval]] for a mean will be not only incorrect, in the sense that the true coverage level will differ from the nominal (e.g., 95%) level, but they will also result in unequal error probabilities on each side.
 
Skewness can be used to obtain approximate probabilities and quantiles of distributions (such as [[value at risk]] in finance) via the [[Cornish-FisherCornish–Fisher expansion]].
 
Many models assume normal distribution; i.e., data are symmetric about the mean. The normal distribution has a skewness of zero. But in reality, data points may not be perfectly symmetric. So, an understanding of the skewness of the dataset indicates whether deviations from the mean are going to be positive or negative.
Line 126:
 
==Other measures of skewness==
[[Image:Comparison mean median mode.svg|thumb|300pxupright=1.35|Comparison of [[mean]], [[median]] and [[mode (statistics)|mode]] of two [[log-normal distribution]]s with the same medians and different skewnesses.]]
 
Other measures of skewness have been used, including simpler calculations suggested by [[Karl Pearson]]<ref name="Capistrano1">{{cite web |url=http://www.stat.upd.edu.ph/s114%20cnotes%20fcapistrano/Chapter%2010.pdf |title=Archived copy |access-date=2010-04-09 |url-status=dead |archive-url=https://web.archive.org/web/20100705025706/http://www.stat.upd.edu.ph/s114%20cnotes%20fcapistrano/Chapter%2010.pdf |archive-date=5 July 2010}}</ref> (not to be confused with Pearson's moment coefficient of skewness, see above). These other measures are:
Line 141:
 
Which is a simple multiple of the [[nonparametric skew]].
 
Worth noticing that, since skewness is not related to an order relationship between mode, mean and median, the sign of these coefficients does not give information about the type of skewness (left/right).
 
===Quantile-based measures===
Line 149 ⟶ 147:
:<math>\frac{\frac{{{Q}(3/4)}+{{Q}(1/4)}}{2}-{{Q}(1/2)}}{\frac{{{Q}(3/4)}-{{Q}(1/4)}}{2}}
=\frac{{{Q}(3/4)}+{{Q}(1/4)}-2{{Q}(1/2)}}{{{Q}(3/4)}-{{Q}(1/4)}},</math>
where ''Q'' is the [[quantile function]] (i.e., the inverse of the [[cumulative distribution function]]). The numerator is difference between the average of the upper and lower quartiles (a measure of location) and the median (another measure of location), while the denominator is the [[semi-interquartile range]] <math>({Q}(3/4)}-{{Q}(1/4))/2</math>, which for symmetric distributions is equal to the [[Average absolute deviation|MAD]] measure of [[statistical dispersion|dispersion]].{{citation needed|date=May 2024}}
 
Other names for this measure are Galton's measure of skewness,<ref name=Johnson1994>{{harvp|Johnson, NL|Kotz, S|Balakrishnan, N|1994}} p. 3 and p. 40</ref> the Yule–Kendall index<ref name=Wilks1995>Wilks DS (1995) ''Statistical Methods in the Atmospheric Sciences'', p 27. Academic Press. {{isbn|0-12-751965-3}}</ref> and the quartile skewness,<ref>{{Cite web|url=http://mathworld.wolfram.com/Skewness.html|title=Skewness|last=Weisstein|first=Eric W.|website=mathworld.wolfram.com|language=en|access-date=2019-11-21}}</ref>
Line 167 ⟶ 165:
Groeneveld and Meeden have suggested, as an alternative measure of skewness,<ref name=Groeneveld1984 />
 
: <math> \mathrmoperatorname{skew}(X) = \frac{( \mu - \nu ) }{ \operatorname E( | X - \nu | ) }, </math>
 
where ''μ'' is the mean, ''ν'' is the median, |...| is the [[absolute value]], and ''E''() is the expectation operator. This is closely related in form to [[Skewness#Pearson.27s second skewness coefficient .28median skewness.29|Pearson's second skewness coefficient]].
 
===L-moments===
Line 217 ⟶ 215:
{{refbegin}}
* {{cite book|author1=Johnson, NL|author2=Kotz, S|author3=Balakrishnan, N|year=1994|title=Continuous Univariate Distributions|volume=1|edition=2|publisher=Wiley|isbn=0-471-58495-9}}
* {{cite journal | last1 = MacGillivray | first1 = HL | year = 1992 | title = Shape properties of the g- and h- and Johnson families | journal = Communications in Statistics - Theory and Methods | volume = 21 | issue = 5| pages = 1244–1250 | doi = 10.1080/03610929208830842 }}
* Premaratne, G., Bera, A. K. (2001). Adjusting the Tests for Skewness and Kurtosis for Distributional Misspecifications. Working Paper Number 01-0116, University of Illinois. Forthcoming in Comm in Statistics, Simulation and Computation. 2016 1-151–15
* Premaratne, G., Bera, A. K. (2000). Modeling Asymmetry and Excess Kurtosis in Stock Return Data. Office of Research Working Paper Number 00-0123, University of Illinois.
* [https://ssrn.com/abstract=2590356 Skewness Measures for the Weibull Distribution]