Power and Sample size for Proportion Data | Statistical Consulting Group

An often used method in applied statistics is determining the sample size necessary to view statistically significant results. Given the intended power, we can calculate the required sample size. Given the intended sample size, we can calculate the resulting power. Before we go in to how this works, we need to define a few things.

Error Types

		Truth
		H₀	H₁
Test	Negative Don’t Reject	True Negative	False Negative β
Test	Positive Reject	False Positive α	True Positive Power = 1 – β

$\alpha$ = False Positive Rate.
This is the chance of rejecting the null hypothesis $H_0$ , given that the null hypothesis is true.
$\beta$ = False Negative Rate.
This is the chance of failing to reject the null hypothesis, given the alternative hypothesis was true.
Power is viewed as the complement of $\beta$ , the false negative rate. The power of the test is the chance to reject the null hypothesis, given the null hypothesis is false. (Given the alternative hypothesis is true)

Using these error types, we can make guesses as to the sample size necessary to achieve significant results to support our alternative hypotheses. The actual calculation for power and sample size is a little different from the normally distributed data, because in proportional data the variance is a function of the proportion, rather than being independent of the mean.

Sample Size Calculation

Case 1: One Sided Test
Given $\alpha, \beta$ ,
Given $x_1,\ldots,x_n \sim B(p)$

$\begin{align*} H_0:\hspace{1cm}p &= p_0\\ H_1:\hspace{1cm}p &= p_1 > p_0\\ \end{align*}$

In this calculation we’re using $p_1 > p_0$ . We will show later why the direction is not important, merely that we’re only considering values on one side of $p_0$ . Because $x$ follows a Bernoulli distribution, $\xbar$ is a good estimator for $p$ .

$\begin{align*} \phat \sim N(p,\frac{p(1-p)}{n}) \text{ when $n$ is large}\\ Z = \frac{\phat - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \sim N(0,1) \text{ Under $H_0$}\\ \end{align*}$

Remember that in a one-sided test with $p_1 > p_0$ , we’re going to reject if $Z_{obs} > Z_{\alpha}$ .

$\begin{align*} \alpha &= P(\text{Type 1 Error})\\ &= P(\frac{\phat - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} > Z_{\alpha})\\ \beta &= P(\text{Type 2 Error})\\ &= P(\frac{\phat - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} < Z_{\alpha} | H_1\text{ is true})\\ \text{Under }H_1\hspace{1cm}&\frac{\phat - p_1}{\sqrt{\frac{p1(1-p_1)}{n}}} \sim N(0,1)\\ \text{So, }\beta &= P(\phat < p_0 + Z_{\alpha}\sqrt{\frac{p_0(1-p0)}{n}} | H_1)\\ &= P(\frac{\phat - p_1}{\sqrt{\frac{p_1(1-p_1)}{n}}} < \frac{p_0 - p_1}{\sqrt{\frac{p1(1-p1)}{n}}} + Z_{\alpha}\sqrt{\frac{p_0(1-p_0)}{p_1(1-p_1)}})\\ &= P(Z < \frac{p_0 - p_1}{\sqrt{\frac{p_1(1-p_1)}{n}}} + Z_{\alpha}\sqrt{\frac{p_0(1-p_0)}{p_1(1-p_1)}})\\ \text{Thus, }n &= p_1(1-p_1)[\frac{Z_{\beta} - Z_{\alpha}\sqrt{\frac{p_0(1-p_0)}{p_1(1-p_1)}}}{p_0 - p_1}]^2 \end{align*}$
Case 2: 2-sided Test
In the two-sided test, we reject if . The calculation for the 2-sided test follows very similarly to the one-sided test, however we change the to to reflect that we’re allowing values on both sides of the null hypothesis. The formula for sample size is thusly:

$\begin{equation*} n &= p_1(1-p_1)[\frac{Z_{\beta} - Z_{\frac{\alpha}{2}}\sqrt{\frac{p_0(1-p_0)}{p_1(1-p_1)}}}{p_0 - p_1}]^2 \end{equation*}$

All else remains the same.

Additional Links