Welch's t-test

In statistics, Welch's t-test, or unequal variances t-test, is a two-sample location test which is used to test the (null) hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student's t-test,^[1] and is more reliable when the two samples have unequal variances and possibly unequal sample sizes.^[2]^[3] These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's t-test has been less popular than Student's t-test^[2] and may be less familiar to readers, a more informative name is "Welch's unequal variances t-test" — or "unequal variances t-test" for brevity.^[3]

YouTube Encyclopedic

1/5
Views:
33 997
65 144
20 029
1 171
6 256

Transcription

Let's take a look at an introduction to Welch (unpooled) t tests and confidence intervals. In this video I will look at some of the underlying concepts, and I work through an example of this in another video. Here's a quick example to start. Here are boxplots of lead levels in the blood of random samples of Cairo traffic officers and officers in the suburbs. And it appears as though these Cairo traffic officers tend to have greater lead concentration in their blood when compared these officers from the suburbs. But is this observed difference a significant difference? And can we estimate the difference in population means with a confidence interval? We often wish to test if there is a significant difference between the groups. In other words, is there strong evidence that the population mean of group 1 differs from the population mean of group 2? We also very often wish to estimate the difference in population means, mu_1 minus mu_2, with a confidence interval. The Welch t procedures are very similar to the pooled variance t procedures. They are similar in spirit and they will help us answer the same questions. If you're very comfortable with the pooled variance t procedures then quite a bit this video is going to be review. But unlike the pooled variance t procedures, the Welch procedures do not require the assumption of equal population variances. And the Welch procedures have a different standard error and degrees of freedom than the pooled variance t procedures. Here are the assumptions of the Welch t procedure. In other words, what we require in order for the procedures to be valid. And we require independent simple random samples from the populations of interest or the methods still work for randomized experiments as well. And we also need normally distributed populations, but this normality assumption is not very important if we have large sample sizes. Recall that the pooled variance t procedure had the third assumption that the population variances are equal, but the Welch procedure does not, so the Welch procedure will be valid in a wider variety of situations than the pooled variance t procedure. The downside is that the Welch procedure is only an approximate procedure. and not an exact one. But that distinction has little practical importance. To do any statistical inference calculations we are going to need the standard error of the difference in sample means. So the first step in the calculations is finding that standard error. And this standard error of the difference in sample means estimates the true standard deviation of the sampling distribution of the difference in sample means. I've written a W in the subscript here to denote this as the standard error for the Welch procedure and that it's different from the standard error of the pooled variance procedure. and that is different from the standard error of the pooled variance procedure. Here we do not pool the variances together. Here's the appropriate formula for a confidence interval for this difference in population means. And here we take our estimator of mu_1 minus mu_2, the difference in the sample means, and we add and subtract the margin of error. The margin of error is made up of two parts: the standard error of our estimator and this t value that we've discussed previously that we're going to get from the t distribution. But we need the appropriate degrees of freedom. And what are the degrees of freedom here? It turns out that the appropriate degrees of freedom are given by this ugly formula, which is called the Welch-Satterthwaite approximation. People don't typically calculate this by hand, so it's best to use statistical software to get it. We often want to test the null hypothesis that the population means are equal. This is a natural test that comes a very frequently in practice. We may be interested in testing if Cairo traffic officers have greater blood lead level than officers in the suburbs, or if a new marketing campaign is more effective than one previously in use. In these cases we test the null hypothesis that the two groups have the same population mean, and see how much evidence we have against that. We test this null hypothesis against one of these three possible alternative hypotheses. As per usual it's best to choose this two-sided alternative unless we have some strong reason to be interested in only one side. Here's our null hypothesis and test statistic. And this null hypothesis can be written another way, as the null hypothesis that mu_1 - mu_2 is equal to 0. And then we construct our test statistic in the usual way, we take our estimator of this quantity and we subtract the hypothesized value of zero, that's are hypothesized value, and then we divide by the standard error of the estimator. Usually we want to test the null hypothesis that this difference is zero, and so we simply forget about that, but if we wanted to test that the difference in population means is equal to some value, we would simply subtract that value in the numerator. If the null hypothesis is true, and the assumptions are true, this test statistic will have approximately a t distribution with degrees of freedom given by that somewhat ugly degrees of freedom formula. I use the p-value approach so after calculating the test statistic we need to find the p value and make our conclusion. So let's work through how we'd go about doing that. I've plotted a t distribution here. The exact shape of the t distribution will depend on the degrees of freedom, but let's let this curve represent the t distribution with the appropriate degrees of freedom. Suppose we go through, get our samples, calculator our test statistic, and find that the t value is -1.5. We draw that -1.5 on our curve, and it's right around here somewhere. What would the p-value be here? Well our alternative hypothesis is that mu_1 is greater than mu_2 and note that we are going X_1 bar minus X_2 bar in the numerator of our test statistic, and so values far out in the right tail of the distribution would give us evidence against this null hypothesis and in favor of this alternative. And so the p-value is the probability, under the null hypothesis, of getting this value we got in our sample or something even farther to the right. Or in other words, it is the area to the right of the observed value of our test statistic. What if we changed the alternative hypothesis to mu_1 less than mu_2? Well suppose again that we went ahead and we got our samples and we found the same value of our test statistic of -1.5 and again, -1.5 falls right around there. Well, since we are going X_1 bar minus X_2 bar, values far out in the left tail of the distribution give evidence against the null hypothesis and in favor of this alternative. And so the p-value is the probability, under the null hypothesis, of getting this t value that we observed or something even farther to the left. Or in other words, the area to the left of the observed value of our test statistic. What about a two-sided alternative hypothesis? Well here values of the test statistic far out in the right tail of the distribution or far out in the left tail of the distribution give strong evidence against the null hypothesis. So suppose again that we went through and we got a test statistic value of -1.5. Our p-value is going to be the probability, under the null hypothesis, of getting this value or something even more extreme. Or in other words, the probability under the null hypothesis of getting -1.5 or something even farther left. Or on the other side of things 1.5 or something even farther right. These two areas. And another way of looking at it is that our p-value is double the area in the tail of the distribution, beyond the observed value of our test statistic. Once we have the p value we will draw a conclusion in the usual ways. A very small p-value will give very strong evidence against the null hypothesis and in favor of the alternative hypothesis. And if we have a given significance level alpha we will reject the null hypothesis in favor of the alternative hypothesis if the p-value is less than or equal to alpha. I work through an example of a Welch t test and confidence interval in another video.

Assumptions

Student's t-test assumes that the sample means being compared for two populations are normally distributed, and that the populations have equal variances. Welch's t-test is designed for unequal population variances, but the assumption of normality is maintained.^[1] Welch's t-test is an approximate solution to the Behrens–Fisher problem.

Calculations

Welch's t-test defines the statistic t by the following formula:

t={\frac {\Delta {\overline {X}}}{s_{\Delta {\bar {X}}}}}={\frac {{\overline {X}}_{1}-{\overline {X}}_{2}}{\sqrt {{s_{{\bar {X}}_{1}}^{2}}+{s_{{\bar {X}}_{2}}^{2}}}}}\,

s_{{\bar {X}}_{i}}={s_{i} \over {\sqrt {N_{i}}}}\,

where ${\overline {X}}_{i}$ and $s_{{\bar {X}}_{i}}$ are the $i^{\text{th}}$ sample mean and its standard error, with $s_{i}$ denoting the corrected sample standard deviation, and sample size $N_{i}$ . Unlike in Student's t-test, the denominator is not based on a pooled variance estimate.

The degrees of freedom $\nu$ associated with this variance estimate is approximated using the Welch–Satterthwaite equation:^[4]

\nu \quad \approx \quad {\frac {\left(\;{\frac {s_{1}^{2}}{N_{1}}}\;+\;{\frac {s_{2}^{2}}{N_{2}}}\;\right)^{2}}{\quad {\frac {s_{1}^{4}}{N_{1}^{2}\nu _{1}}}\;+\;{\frac {s_{2}^{4}}{N_{2}^{2}\nu _{2}}}\quad }}.

This expression can be simplified when $N_{1}=N_{2}$ :

\nu \approx {\frac {s_{\Delta {\bar {X}}}^{4}}{\nu _{1}^{-1}s_{{\bar {X}}_{1}}^{4}+\nu _{2}^{-1}s_{{\bar {X}}_{2}}^{4}}}.

Here, $\nu _{i}=N_{i}-1$ is the degrees of freedom associated with the i-th variance estimate.

The statistic is approximately from the t-distribution since we have an approximation of the chi-square distribution. This approximation is better done when both $N_{1}$ and $N_{2}$ are larger than 5.^[5]^[6]

Statistical test

Once t and $\nu$ have been computed, these statistics can be used with the t-distribution to test one of two possible null hypotheses:

that the two population means are equal, in which a two-tailed test is applied; or
that one of the population means is greater than or equal to the other, in which a one-tailed test is applied.

The approximate degrees of freedom are real numbers $\left(\nu \in \mathbb {R} ^{+}\right)$ and used as such in statistics-oriented software, whereas they are rounded down to the nearest integer in spreadsheets.

Advantages and limitations

Welch's t-test is more robust than Student's t-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes under normality. Furthermore, the power of Welch's t-test comes close to that of Student's t-test, even when the population variances are equal and sample sizes are balanced.^[2] Welch's t-test can be generalized to more than 2-samples,^[7] which is more robust than one-way analysis of variance (ANOVA).

It is not recommended to pre-test for equal variances and then choose between Student's t-test or Welch's t-test.^[8] Rather, Welch's t-test can be applied directly and without any substantial disadvantages to Student's t-test as noted above. Welch's t-test remains robust for skewed distributions and large sample sizes.^[9] Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch's t-test.^[10]

Software implementations

Language/Program	Function	Documentation
LibreOffice	`TTEST(Data1; Data2; Mode; Type)`	^[11]
MATLAB	`ttest2(data1, data2, 'Vartype', 'unequal')`	^[12]
Microsoft Excel pre 2010 (Student's T Test)	`TTEST(array1, array2, tails, type)`	^[13]
Microsoft Excel 2010 and later (Student's T Test)	`T.TEST(array1, array2, tails, type)`	^[14]
Minitab	Accessed through menu	^[15]
Origin software	Results of the Welch t-test are automatically outputted in the result sheet when conducting a two-sample t-test (Statistics: Hypothesis Testing: Two-Sample t-test)	^[16]
SAS (Software)	Default output from `proc ttest` (labeled "Satterthwaite")
Python (through 3rd-party library SciPy)	`scipy.stats.ttest_ind(a, b, equal_var=False)`	^[17]
R	`t.test(data1, data2, var.equal = FALSE)`	^[18]
Haskell	`Statistics.Test.StudentT.welchTTest SamplesDiffer data1 data2`	^[19]
JMP	`Oneway( Y( YColumn), X( XColumn), Unequal Variances( 1 ) );`	^[20]
Julia	`UnequalVarianceTTest(data1, data2)`	^[21]
Stata	`ttest varname1 == varname2, welch`	^[22]
Google Sheets	`TTEST(range1, range2, tails, type)`	^[23]
GraphPad Prism	It is a choice on the t test dialog.
IBM SPSS Statistics	An option in the menu	^[24]^[25]
GNU Octave	`welch_test(x, y)`	^[26]

References

^ ^a ^b Welch, B. L. (1947). "The generalization of "Student's" problem when several different population variances are involved". Biometrika. 34 (1–2): 28–35. doi:10.1093/biomet/34.1-2.28. MR 0019277. PMID 20287819.
^ ^a ^b ^c Ruxton, G. D. (2006). "The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test". Behavioral Ecology. 17 (4): 688–690. doi:10.1093/beheco/ark016.
^ ^a ^b Derrick, B; Toher, D; White, P (2016). "Why Welchs test is Type I error robust" (PDF). The Quantitative Methods for Psychology. 12 (1): 30–38. doi:10.20982/tqmp.12.1.p030.
^ 7.3.1. Do two processes have the same mean?, Engineering Statistics Handbook, NIST. (Online source accessed 2021-07-30.)
^ Allwood, Michael (2008). "The Satterthwaite Formula for Degrees of Freedom in the Two-Sample t-Test" (PDF). p. 6.
^ Yates; Moore; Starnes (2008). The Practice of Statistics (3rd ed.). New York: W.H. Freeman and Company. p. 792. ISBN 9780716773092.
^ Welch, B. L. (1951). "On the Comparison of Several Mean Values: An Alternative Approach". Biometrika. 38 (3/4): 330–336. doi:10.2307/2332579. JSTOR 2332579.
^ Zimmerman, D. W. (2004). "A note on preliminary tests of equality of variances". British Journal of Mathematical and Statistical Psychology. 57 (Pt 1): 173–181. doi:10.1348/000711004849222. PMID 15171807.
^ Fagerland, M. W. (2012). "t-tests, non-parametric tests, and large studies—a paradox of statistical practice?". BMC Medical Research Methodology. 12: 78. doi:10.1186/1471-2288-12-78. PMC 3445820. PMID 22697476.
^ Fagerland, M. W.; Sandvik, L. (2009). "Performance of five two-sample location tests for skewed distributions with unequal variances". Contemporary Clinical Trials. 30 (5): 490–496. doi:10.1016/j.cct.2009.06.007. PMID 19577012.
^ "Statistical Functions Part Five - LibreOffice Help".
^ "Two-sample t-test - MATLAB ttest2 - MathWorks United Kingdom".
^ "TTEST - Excel - Microsoft Office". office.microsoft.com. Archived from the original on 2010-06-13.
^ "T.TEST function".
^ Overview for 2-Sample t - Minitab: — official documentation for Minitab version 18. Accessed 2020-09-19.
^ "Help Online - Quick Help - FAQ-314 Does Origin supports Welch's t-test?". www.originlab.com. Retrieved 2023-11-09.
^ "Scipy.stats.ttest_ind — SciPy v1.7.1 Manual".
^ "R: Student's t-Test".
^ "Statistics.Test.StudentT".
^ "Index of /Support/Help".
^ "Welcome to Read the Docs — HypothesisTests.jl latest documentation".
^ "Stata 17 help for ttest".
^ "T.TEST - Docs Editors Help".
^ Jeremy Miles: Unequal variances t-test or U Mann-Whitney test?, Accessed 2014-04-11
^ One-Sample Test — Official documentation for SPSS Statistics version 24. Accessed 2019-01-22.
^ "Function Reference: Welch_test".

This page was last edited on 13 June 2024, at 04:59

From Wikipedia, the free encyclopedia