8.2 p-value

In some sense the p-value offers a first defense line against being fooled by randomness, separating signal from noise.

Yoav Benjamini, 2016

Roughly speaking, p-values tell us how surprising the observed data is, assuming there is no effect.
p-value is defined as the conditional probability of getting a test statistic as extreme or more extreme than the observed data, given that the null hypothesis is true.

\[ p\text{-value} \stackrel{\text{def}}{=} \text{P}(\text{data} \mid \text{$H_0$ is true}) \]

It is not the probability the null hypothesis is true.

\[ p\text{-value} \neq \text{P}(\text{$H_0$ is true} \mid \text{data}) \]

We suspect a coin is biased towards heads.

\[ \small{ \begin{aligned} &\text{$H_0$: the coin is fair ($p=0.5$).} \\ &\text{$H_a$: the coin is biased towards heads ($p>0.5$).} \\ \end{aligned} } \]

We tossed the coin 10 times and got 10 heads.

\[ \small{ \begin{aligned} \text{$p$-value}&=\text{P}(\text{get 10 heads $\mid$ $H_0$ is true}) \\ &={10 \choose 10}0.5^{10} \approx 0.001 \\ \end{aligned} } \]

Assuming $H_0$ is true, it is highly unlikely to get 10 heads.

Since we indeed got 10 heads, $H_0$ is probably not true.

\[ \small{ \begin{aligned} &\text{$H_0$: the coin is fair ($p=0.5$).} \\ &\text{$H_a$: the coin is biased towards heads ($p>0.5$).} \\ \end{aligned} } \]

We tossed the coin 10 times and got 9 heads.

\[ \small{ \begin{aligned} \text{$p$-value}=\;&\text{P}(\text{get 9 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{as extreme}} \\ &\text{P}(\text{get 10 heads $\mid$ $H_0$ is true}) \;\;\;\color{gray}{\leftarrow\text{more extreme}} \\ =\;&{10 \choose 9}0.5^{10}+{10 \choose 10}0.5^{10} \approx 0.011 \\ \end{aligned} } \]

Assuming $H_0$ is true, it is unlikely to get 9 or more heads.

Since we indeed got 9 heads, $H_0$ is probably not true.

\[ \small{ \begin{aligned} &\text{$H_0$: the coin is fair ($p=0.5$).} \\ &\text{$H_a$: the coin is biased towards heads ($p>0.5$).} \\ \end{aligned} } \]

We tossed the coin 10 times and got 8 heads.

\[ \small{ \begin{aligned} p\text{-value}=\;&\text{P}(\text{got 8 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{as extreme}} \\ &\text{P}(\text{got 9 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{more extreme}} \\ &\text{P}(\text{got 10 heads $\mid$ $H_0$ is true}) \;\;\;\color{gray}{\leftarrow\text{more extreme}} \\ =\;&{10 \choose 8}0.5^{10}+{10 \choose 9}0.5^{10}+{10 \choose 10}0.5^{10} \approx 0.055 \\ \end{aligned} } \]

Assuming $H_0$ is true, it is unlikely (?) to get 8 or more heads.

We tossed the coin 10 times and got 7 heads.

\[ \small{ \begin{aligned} p\text{-value}=\;&\text{P}(\text{got 7 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{as extreme}} \\ &\text{P}(\text{got 8 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{more extreme}} \\ &\text{P}(\text{got 9 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{more extreme}} \\ &\text{P}(\text{got 10 heads $\mid$ $H_0$ is true}) \;\;\;\color{gray}{\leftarrow\text{more extreme}} \\ =\;&{10 \choose 7}0.5^{10}+{10 \choose 8}0.5^{10}+{10 \choose 9}0.5^{10}+{10 \choose 10}0.5^{10} \\ \approx\;&0.172 \\ \end{aligned} } \]

Assuming $H_0$ is true, it is not unlikely to get 7 or more heads.

Getting 7 heads is not enough evidence to reject $H_0$.

We tossed the coin 10 times and got 6 heads.

\[ \small{ \begin{aligned} p\text{-value}=\;&\text{P}(\text{got 6 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{as extreme}} \\ &\text{P}(\text{got 7 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{more extreme}} \\ &\text{P}(\text{got 8 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{more extreme}} \\ &\text{P}(\text{got 9 heads $\mid$ $H_0$ is true})\;+ \;\color{gray}{\leftarrow\text{more extreme}} \\ &\text{P}(\text{got 10 heads $\mid$ $H_0$ is true}) \;\;\;\color{gray}{\leftarrow\text{more extreme}} \\ =\;&{10 \choose 6}0.5^{10}+{10 \choose 7}0.5^{10}+\cdots+{10 \choose 10}0.5^{10} \\ \approx\;&0.377 \\ \end{aligned} } \]

Assuming $H_0$ is true, it is not unlikely to get 6 or more heads. Do not reject $H_0$.

Decision rule based on p-value

First we specify a significance level $\alpha$ (the desired tolerance for type I error probability).

Then,

\[ \small{ \begin{aligned} &\text{Reject $H_0$ if $p\text{-value} < \alpha$}, \\ \\ &\text{Fail to reject $H_0$ if $p\text{-value} \geq \alpha$}. \\ \end{aligned} } \]

We suspect a coin is biased (towards either heads or tails).

\[ \small{ \begin{aligned} &\text{$H_0$: the coin is fair ($p=0.5$).} \\ &\text{$H_a$: the coin is biased ($p \neq 0.5$).} \\ \end{aligned} } \]

We tossed the coin 10 times and got 9 heads.

\[ \small{ \begin{aligned} \text{$p$-value} =\;&\text{P}(\text{got 9 heads $\mid$ $H_0$ is true})+ \;\color{gray}{\leftarrow\text{as extreme}} \\ &\text{P}(\text{got 9}\color{red}{\text{ tails }} \text{$\mid$ $H_0$ is true})+ \;\color{gray}{\leftarrow\text{as extreme}} \\ &\text{P}(\text{got 10 heads $\mid$ $H_0$ is true})+ \;\color{gray}{\leftarrow\text{more extreme}} \\ &\text{P}(\text{got 10}\color{red}{\text{ tails }} \text{$\mid$ $H_0$ is true}) \;\color{gray}{\leftarrow\text{more extreme}} \\ \approx\;&0.022 \\ \end{aligned} } \]

\[ \text{$p$-value} \approx 0.022 \]

Assuming $H_0$ is true, it is unlikely to get 9 or more heads or tails.

Since we indeed got 9 heads, $H_0$ is likely not true.

What does it mean if p-value > 0.05?

p-value > 0.05 doesn’t mean the null hypothesis is true.
It simply means that we do not have enough information to answer this question.
There may be effects that would reveal with more data.

Absence of evidence is not evidence of absence.