8.7 Two-sample t-test

The two sample t-tests

What if at least one sample size is small and we don’t know the population variance?
If both $X$ and $Y$ are normal and independent,

\[ \small{ T=\frac{\bar{X}-\bar{Y}-(\mu_X-\mu_Y)}{\sqrt{\frac{S_X^2}{m}+\frac{S_Y^2}{n}}}\text{ is approximately a $t$-distribution.} } \]

\[ \small{ T=\frac{\bar{X}-\bar{Y}-(\mu_X-\mu_Y)}{\sqrt{\frac{S_X^2}{m}+\frac{S_Y^2}{n}}} } \]

\[ \small{ \text{degree of freedom: }v=\frac{\big(\frac{s_X^2}{m}+\frac{s_Y^2}{n}\big)^2}{\frac{s_X^2/m}{m-1}+\frac{s_Y^2/n}{n-1}}\\ \text{Round down to the nearest integer.} } \]

We can use it as the test statistic.

We follow the same test procedure as before.

\[ \begin{aligned} &H_0: \mu_X=\mu_Y \; \text{ (or, } \mu_X-\mu_Y=0 ) \\ (1)\; &H_a: \mu_X \neq \mu_Y \;\text{ (or, } \mu_X-\mu_Y\neq0)\;\;\;\;\color{gray}{\rightarrow\text{two-tailed}}\\ (2)\; &H_a: \mu_X < \mu_Y \;\text{ (or, } \mu_X-\mu_Y < 0) \;\;\;\;\color{gray}{\rightarrow\text{one-tailed}}\\ (3)\; &H_a: \mu_X > \mu_Y \;\text{ (or, } \mu_X-\mu_Y > 0) \;\;\;\;\color{gray}{\rightarrow\text{one-tailed}}\\ \end{aligned} \]

Effects of a fusion treatment on material strength

	Sample size	Sample mean	Sample STD
No fusion ($X$)	10	2,902.8	277.3
Fused ($Y$)	8	3,108.1	205.9

Does the treatment increase the material strength?

\[ \small{ H_0: \mu_X-\mu_Y=0,\;\;\;\;H_a: \mu_X-\mu_Y< 0 } \]

\[ \small{ t=\frac{\bar{x}-\bar{y}-0}{\sqrt{\frac{s_X^2}{m}+\frac{s_Y^2}{n}}}=\frac{2902.8-3108.1}{\sqrt{\frac{277.3^2}{10}+\frac{205.9^2}{8}}}\approx -1.801 } \]

\[ \small{ v=\frac{\big(\frac{s_X^2}{m}+\frac{s_Y^2}{n}\big)^2}{\frac{s_X^2/m}{m-1}+\frac{S_s^2/n}{n-1}}=\frac{\big(\frac{277.3^2}{10}+\frac{205.9^2}{8}\big)^2}{\frac{277.3^2/10}{10-1}+\frac{205.9^2/8}{8-1}}\approx 15.94 } \]

\[ \small{ \begin{aligned} &p\text{-value}\\ =\;&\text{P}(T_{v=15}< -1.801) \\ =\;&0.046 \end{aligned} } \]

Pooled t procedure

\[ \small{ T=\frac{\bar{X}-\bar{Y}-(\mu_X-\mu_Y)}{\sqrt{\frac{S_X^2}{m}+\frac{S_Y^2}{n}}},\;\;\;\; v=\frac{\big(\frac{S_X^2}{m}+\frac{S_Y^2}{n}\big)^2}{\frac{S_X^2/m}{m-1}+\frac{S_Y^2/n}{n-1}} } \]

If it’s reasonable to assume $X$ and $Y$ have (unknown but) equal variance (i.e., $\sigma_X^2=\sigma_Y^2$), we can simplify it to

\[ \small{ S_{\text{pooled}}^2=\frac{m-1}{m+n-2}S_X^2+\frac{n-1}{m+n-2}S_Y^2 } \]

\[ \small{ T=\frac{\bar{X}-\bar{Y}-(\mu_X-\mu_Y)}{S_{\text{pooled}}\sqrt{\frac{1}{m}+\frac{1}{n}}}}\;\text{ is approximately a $t$-distribution. } \]

\[ \small{ \text{with degree of freedom of $(m+n-2)$.} } \]

Paired t-test

So far, we assumed $X_1, X_2, \cdots, X_m$ and $Y_1, Y_2, \cdots, Y_n$ are all independent.
- A group of people who eat fast food, and another group of people who don’t
In many situations, there’s only one set of $n$ individuals or objects, and we make two observations on each.
- Patient blood pressure before/after taking a drug
- Student test scores before/after a tutoring program

Assumptions

The data consists of $n$ independently selected pairs

\[ (X_1, Y_1), (X_2, Y_2), \cdots, (X_n, Y_n) \]

Let $D$ denotes the difference between the first and second observations within a pair.

\[ D_i = X_i-Y_i, \;\;\;\; i=1, 2, \cdots, n \]

\[ \mu_D=\text{E}[X-Y]=\mu_X-\mu_Y \]

Then we can do the one-sample test on the difference.

Paired t-test

\[ \begin{aligned} &H_0: \mu_D=0 \\ (1)\; &H_a: \mu_D \neq 0\;\;\;\;\color{gray}{\rightarrow\text{two-tailed}} \\ (2)\; &H_a: \mu_D>0\;\;\;\;\color{gray}{\rightarrow\text{one-tailed}} \\ (3)\; &H_a: \mu_D < 0\;\;\;\;\color{gray}{\rightarrow\text{one-tailed}} \\ \end{aligned} \]

\[ \text{The test statistic:}\;\;\;\;t=\frac{\bar{d}-0}{s_D/\sqrt{n}} \]

\[ \text{with a degree of freedom of $n-1$.} \]

Student test scores before/after a tutoring program

Student	Before	After	Difference
1	83	88	5
2	83	73	-10
3	67	83	16
4	72	65	-7
5	83	92	9
6	80	83	3
7	94	95	1
8	82	77	-5
9	74	89	15
10	74	86	12

Is the tutoring program effective?

\[ \text{Difference: }[5, -10, 16, -7, 9, 3, 1, -5, 15, 12] \]

Let $\mu_D$ denote the true average difference between the scores before and after the program.

\[ H_0: \mu_D=0, \;\;\;\;H_a: \mu > 0 \]

\[ n=10, \;\;\;\; \bar{d}=3.9, \;\;\;\; s_D=9.207 \]

\[ t=\frac{\bar{d}-0}{s_D/\sqrt{n}}=\frac{3.9}{9.207/\sqrt{10}}\approx 0.134 \]

\[ \text{with a degree of freedom of $n-1=9$.} \]

\[ \small{ \begin{aligned} &p\text{-value}\\ =\;&\text{P}(T_{v=9} > 0.134) \\ \approx\;& 0.461 \end{aligned} } \]

Student	Before	After	Difference
1	83	88	5
2	83	73	-10
3	67	83	16
4	72	65	-7
5	83	92	9
6	80	83	3
7	94	95	1
8	82	77	-5
9	74	89	15
10	74	86	12

Student	Before	After	Difference
1	83	88	5
2	83	73	-10
3	67	83	16
4	72	65	-7
5	83	92	9
6	80	83	3
7	94	95	1
8	82	77	-5
9	74	89	15
10	74	86	12

Student	Before	After	Difference
1	83	88	5
2	83	73	-10
3	67	83	16
4	72	65	-7
5	83	92	9
6	80	83	3
7	94	95	1
8	82	77	-5
9	74	89	15
10	74	86	12