7.2 Confidence interval

Point estimate

Based on 80 samples, the battery life is $\hat{\theta}=11.4$ hours.
It provides no info about how precise the estimate is.
Oftentimes, we are interested in constructing a so-called confidence interval (CI).
Roughly speaking, it is an interval that contains the parameter with a specified high probability (e.g., 95%).
- The battery life has a 95% CI of [10.2,12.7] hours.

Constructing a confidence interval

First, we need to specify a desired high probability.
- This high probability is called the confidence level.
- Common choices include 95% and 99%
- $(1-\alpha)$ is often used to indicate the confidence level

\[95\% \; \rightarrow \alpha=0.05\]

\[99\% \; \rightarrow \alpha=0.01\]

Constructing a confidence interval

Once we specified the confidence level, say 95%, we aim to have a confidence interval with
- a lower bound estimator $\hat{\Theta}_n^{-}$
- an upper bound estimator $\hat{\Theta}_n^{+}$
so that

\[\text{P}\bigg(\hat{\Theta}_n^{-} \leq \theta \leq \hat{\Theta}_n^{+}\bigg) = 95\%\]

Procedures for constructing a CI

Suppose $X_1, X_2, \cdots, X_n$ are i.i.d. normal, with unknown mean $\theta$ and a known variance¹ $\sigma^2$.

\[ \hat{\Theta}_n=\frac{X_1+X_2+\cdots+X_n}{n} \]

\[ \hat{\Theta}_n \sim \text{N}\bigg(\theta, \frac{\sigma^2}{n}\bigg)\;\;\rightarrow\;\;\frac{\hat{\Theta}_n-\theta}{\sigma/\sqrt{n}} \sim \text{N}(0, 1) \]

Procedures for constructing a CI

Let’s construct a 95% CI, that is, $\alpha=0.05$.

\[ \begin{aligned} \Phi(\color{red}{?})&=0.025 \\ \Phi(\color{red}{?})&=1-0.025=0.975 \\ \end{aligned} \]

It is called a critical value of $z$ corresponding to $0.025$.

We can check the standard normal table (next slide).

We want

\[ \text{P}\bigg(-1.96 < \frac{\hat{\Theta}_n-\theta}{\sigma/\sqrt{n}} < 1.96\bigg) = 0.95 \]

Multiply through by $\sigma/\sqrt{n}$

\[ \small{ \text{P}\bigg(-1.96 \cdot \frac{\sigma}{\sqrt{n}} < \hat{\Theta}_n-\theta < 1.96 \cdot \frac{\sigma}{\sqrt{n}}\bigg) = 0.95 } \]

Subtract $\hat{\Theta}_n$ from each term

\[ \small{ \text{P}\bigg(-\hat{\Theta}_n-1.96 \cdot \frac{\sigma}{\sqrt{n}} < -\theta < -\hat{\Theta}_n + 1.96 \cdot \frac{\sigma}{\sqrt{n}}\bigg) = 0.95 } \]

Multiply through by $-1$

\[ \small{ \text{P}\bigg(\hat{\Theta}_n + 1.96 \cdot \frac{\sigma}{\sqrt{n}} > \theta > \hat{\Theta}_n - 1.96 \cdot \frac{\sigma}{\sqrt{n}}\bigg) = 0.95 } \]

Rearange the sides.

\[ \text{P}\bigg(\hat{\Theta}_n - 1.96 \cdot \frac{\sigma}{\sqrt{n}} < \theta < \hat{\Theta}_n + 1.96 \cdot \frac{\sigma}{\sqrt{n}}\bigg) = 0.95 \]

This means that

\[ \bigg[\hat{\Theta}_n - 1.96 \cdot \frac{\sigma}{\sqrt{n}},\;\; \hat{\Theta}_n + 1.96 \cdot \frac{\sigma}{\sqrt{n}}\bigg] \]

is a 95% confidence interval.

More compactly, it can be written as $\hat{\Theta}_n \pm 1.96 \cdot \frac{\sigma}{\sqrt{n}}$

More generally, for a confidence level of $1-\alpha$,

let $z_{\frac{\alpha}{2}}$ be such that $\Phi(z_{\frac{\alpha}{2}})=1-\frac{\alpha}{2}$.

\[ \small{ \text{P}\bigg(\hat{\Theta}_n - z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} < \theta < \hat{\Theta}_n + z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\bigg) = 1-\alpha } \]

\[ \small{ \bigg[\hat{\Theta}_n - z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}},\;\; \hat{\Theta}_n + z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\bigg]}, \;\;\text{or }\;\; \hat{\Theta}_n \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n} } \]

is a $(1-\alpha)$ confidence interval.

Interpretation

Assume we have calculated the 95% confidence interval for an unknown parameter $\theta$ as $[10.21,12.69]$.
What does this interval mean?
Does it mean that the true parameter $\theta$ lies in this CI $[10.21,12.69]$ with a probability of 95%?
No. $\theta$ is a unknown but fixed parameter. The CI that we obtained $[10.21,12.69]$ is a fixed interval.

We construct a CI many times w/ the same procedure.
That is, each time we
- obtain an independent collection of $n$ observations
- construct a 95% CI based on the $n$ observations
We expect that about 95% of these CIs will include $\theta$.

It is a random interval centered at point estimate $\hat{\Theta}_n$.
The half-width of the CI is $z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$.
The width of the CI indicates how precise the estimate is.

\[ \hat{\Theta}_n \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \]

\[ \text{90% CI: }\;\;\;\; \hat{\Theta}_n \pm 1.645 \frac{\sigma}{\sqrt{n}} \]

\[ \text{95% CI: }\;\;\;\; \hat{\Theta}_n \pm 1.96 \frac{\sigma}{\sqrt{n}} \]

\[ \text{99% CI: }\;\;\;\; \hat{\Theta}_n \pm 2.58 \frac{\sigma}{\sqrt{n}} \]

\[ \hat{\Theta}_n \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \]

The half-width is proportional to $\sigma$ (population standard deviation), the inherent variability of the thing we are trying to measure.

\[ \hat{\Theta}_n \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \]

The half-width decreases as sample size $n$ increases.

Sample size calculation

We often have a desired precision (i.e., $\pm 0.5$ hours) and a confidence level (e.g., 99%) for a confidence interval.
Question: how many data points should we collect?

\[ \text{Half width:}\;\;d=z_{\frac{\alpha}{2}} \cdot \frac{\sigma}{\sqrt{n}}, \;\;\;\;\text{Solve for $n$.} \]

The sample size needed for a desired half-width of $d$ is

\[ n=\bigg(z_{\frac{\alpha}{2}} \cdot \frac{\sigma}{d}\bigg)^2 \]

Suppose $X_1, X_2, \cdots, X_n$ are i.i.d. normal, with unknown mean $\theta$ and known variance $\sigma^2$.

\[ \small{ \text{Sample mean estimator:}\;\;\hat{\Theta}_n=\frac{X_1+X_2+\cdots+X_n}{n} } \]

\[ \small{ \text{var}\big(\hat{\Theta}_n\big)=\frac{\sigma^2}{n}, \;\;\;\;\sqrt{\text{var}\big(\hat{\Theta}_n\big)}=\frac{\sigma}{\sqrt{n}} } \]

The standard deviation of $\hat{\Theta}_n$ is termed standard error.

\[ \hat{\Theta}_n \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \]

The CI above can be expressed in words as

\[ \text{(point estimate)} \pm \text{($z$ critical value}) \cdot \text{(standard error)} \]

Suppose $X_1, X_2, \cdots, X_n$ are

i.i.d. (not necessarily follow a normal distribution)
with unknown mean $\theta$ and known variance $\sigma^2$

\[ \small{\text{Sample mean estimator:}\;\;\hat{\Theta}_n=\frac{X_1+X_2+\cdots+X_n}{n}} \]

If $n$ is sufficiently large, what does the CLT tell us?

\[ \small{ \begin{aligned} \hat{\Theta}_n \;&\text{ is approximately } \text{N}\bigg(\theta, \frac{\sigma^2}{n}\bigg) \\ \frac{\hat{\Theta}_n-\theta}{\sigma/\sqrt{n}} \;&\text{ is approximately } \text{N}(0, 1) \\ \end{aligned} } \]

Procedures for constructing a CI

We can proceed exactly as in the previous case.

\[ \text{P}\bigg(-z_{\frac{\alpha}{2}} < \frac{\hat{\Theta}_n-\theta}{\sigma/\sqrt{n}} < z_{\frac{\alpha}{2}}\bigg) \approx 1-\alpha \;\;\;\color{gray}{\rightarrow \text{by CLT}} \]

\[ \text{P}\bigg(\hat{\Theta}_n - z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} < \theta < \hat{\Theta}_n + z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\bigg) \approx 1-\alpha \]

What if we don’t know $\sigma^2$?

Suppose $X_1, X_2, \cdots, X_n$ are i.i.d. with unknown mean $\theta$ and unknown variance $\sigma^2$.

\[ \small{ \text{Sample mean estimator:}\;\;\hat{\Theta}_n=\frac{X_1+X_2+\cdots+X_n}{n} } \]

We can estimate the population variance $\sigma^2$ with

\[ \small{ \text{Sample variance estimator:}\;\;\hat{S}_n^2=\frac{\sum\big(X_i - \hat{\Theta}_n\big)^2}{n-1} } \]

\[ \small{ \frac{\hat{\Theta}_n-\theta}{\sigma/\sqrt{n}} \sim \text{N}(0, 1), \; \text{if $X_i$ is normally distributed.} } \]

\[ \small{ \frac{\hat{\Theta}_n-\theta}{\sigma/\sqrt{n}} \text{ is approximately } \text{N}(0, 1), \; \text{if $X_i$ is not normal, but $n$ is large.} } \]

Once we replace $\sigma$ with $\hat{S}_n$, we have

\[ \small{ \frac{\hat{\Theta}_n-\theta}{\hat{S}_n/\sqrt{n}} } \]

\[ \small{ \frac{\hat{\Theta}_n-\theta}{\hat{S}_n/\sqrt{n}} \text{ is approximately } \text{N}(0, 1), \; \text{if $n$ is sufficiently large.} } \]

If $n$ is sufficiently large,

\[ \small{ \text{P}\bigg(-z_{\frac{\alpha}{2}} < \frac{\hat{\Theta}_n-\theta}{\hat{S}_n/\sqrt{n}} < z_{\frac{\alpha}{2}}\bigg) \approx 1-\alpha } \]

Generally speaking, $n>40$ would be sufficient.

\[ \small{ \hat{\Theta}_n \pm z_{\frac{\alpha}{2}} \frac{\hat{S}_n}{\sqrt{n}} } \]

is a CI for $\theta$ with confidence level approximately $1-\alpha$.

\[ \small{ \text{(point estimate)} \pm \text{($z$ critical value)} \cdot \text{(estimated standard error)} } \]

What if the sample size $n$ is small?

Suppose $X_1, X_2, \cdots, X_n$ are i.i.d. normal with unknown mean $\theta$ and unknown variance $\sigma^2$.

\[ \small{ \text{Sample mean estimator:}\;\;\hat{\Theta}_n=\frac{X_1+X_2+\cdots+X_n}{n} } \]

\[ \small{ T_n=\frac{\hat{\Theta}_n-\theta}{\hat{S}_n/\sqrt{n}} } \]

has a probability distribution called a t-distribution with $v=n-1$ degrees of freedom.

\[ \small{ t\text{-distribution:}\;\;\;T_n=\frac{\hat{\Theta}_n-\theta}{\hat{S}_n/\sqrt{n}} } \]

t-distribution

Symmetric
Bell-shaped
More spread out than $N(0, 1)$
Approaches $N(0, 1)$ as $v\rightarrow +\infty$

Interactive visualization

For small $n$, we use t-distribution to construct the CI.

\[ \small{ \text{P}\bigg(-t_{\frac{\alpha}{2}, n-1} < \frac{\hat{\Theta}_n-\theta}{\hat{S}_n/\sqrt{n}} < t_{\frac{\alpha}{2}, n-1}\bigg) = 1-\alpha } \]

\[ \small{ \text{P}\bigg(\hat{\Theta}_n - t_{\frac{\alpha}{2}, n-1} \frac{\hat{S}_n}{\sqrt{n}} < \theta < \hat{\Theta}_n + t_{\frac{\alpha}{2}, n-1} \frac{\hat{S}_n}{\sqrt{n}}\bigg) \approx 1-\alpha } \]

\[ \small{ \text{Or, more compactly, }\hat{\Theta}_n \pm t_{\frac{\alpha}{2}, n-1} \frac{\hat{S}_n}{\sqrt{n}}\;\;\text{is a } (1-\alpha) \text{ CI.} } \]

Critical values for t-distribution

Table in next slide.

Exercise

An object is weighted using a scale.
The scale reports the true weight plus a random error that is normal with $\mu=0$ and unknown variance.
Assume that the errors are independent.
The object was weighted eight times.

\[[0.555, 0.540, 0.636, 0.644, 0.492, 0.567, 0.556, 0.607]\]
Construct a 95% confidence interval for the weight.

\[ \small{ \bigg[\hat{\Theta}_n - t_{\frac{\alpha}{2}, n-1} \frac{\hat{S}_n}{\sqrt{n}}, \;\;\;\hat{\Theta}_n + t_{\frac{\alpha}{2}, n-1} \frac{\hat{S}_n}{\sqrt{n}}\bigg] } \]

\[ \small{ \begin{aligned} \text{Sample mean:}&\; \hat{\Theta}_n=\frac{1}{n}\sum_{i=1}^n X_i=0.5746 \\ \text{Sample variance:}&\; \hat{S}_n^2=\frac{1}{n-1}\sum_{i=1}^n \big(X_i - \hat{\Theta}_n\big)^2=0.002635 \\ \text{Critical value:}&\; t_{\frac{\alpha}{2}, n-1}=t_{0.025, 7}=2.365 \\ \end{aligned} } \]

\[ \small{ \bigg[\hat{\Theta}_n - t_{\frac{\alpha}{2} n-1} \frac{\hat{S}_n}{\sqrt{n}}, \hat{\Theta}_n + t_{\frac{\alpha}{2}, n-1} \frac{\hat{S}_n}{\sqrt{n}}\bigg]=[0.532, 0.618] } \]

7.2 Confidence interval

Point estimate

Constructing a confidence interval

Constructing a confidence interval

Procedures for constructing a CI

Procedures for constructing a CI

Interpretation

Sample size calculation

Procedures for constructing a CI

What if we don’t know \(\sigma^2\)?

What if the sample size \(n\) is small?

t-distribution

Interactive visualization

Critical values for t-distribution

Exercise