Power and Sample size for Proportion Data

An often used method in applied statistics is determining the sample size necessary to view statistically significant results. Given the intended power, we can calculate the required sample size. Given the intended sample size, we can calculate the resulting power. Before we go in to how this works, we need to define a few things.

Error Types

Truth
H0 H1
Test Negative
Don’t Reject
True Negative False Negative
β
Positive
Reject
False Positive
α
True Positive
Power = 1 – β
  • \alpha = False Positive Rate.
    This is the chance of rejecting the null hypothesis H_0, given that the null hypothesis is true.
  • \beta = False Negative Rate.
    This is the chance of failing to reject the null hypothesis, given the alternative hypothesis was true.
  • Power is viewed as the complement of \beta, the false negative rate. The power of the test is the chance to reject the null hypothesis, given the null hypothesis is false. (Given the alternative hypothesis is true)

Using these error types, we can make guesses as to the sample size necessary to achieve significant results to support our alternative hypotheses. The actual calculation for power and sample size is a little different from the normally distributed data, because in proportional data the variance is a function of the proportion, rather than being independent of the mean.

Sample Size Calculation

  • Case 1: One Sided Test
    Given \alpha, \beta,
    Given x_1,\ldots,x_n \sim B(p)

        \begin{align*} H_0:\hspace{1cm}p &= p_0\\ H_1:\hspace{1cm}p &= p_1 > p_0\\ \end{align*}

    In this calculation we’re using p_1 > p_0. We will show later why the direction is not important, merely that we’re only considering values on one side of p_0. Because x follows a Bernoulli distribution, \xbar is a good estimator for p.

        \begin{align*} \phat \sim N(p,\frac{p(1-p)}{n}) \text{  when $n$ is large}\\ Z = \frac{\phat - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \sim N(0,1) \text{ Under $H_0$}\\ \end{align*}

    Remember that in a one-sided test with p_1 > p_0, we’re going to reject if Z_{obs} > Z_{\alpha}.

        \begin{align*} \alpha &= P(\text{Type 1 Error})\\ &= P(\frac{\phat - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} > Z_{\alpha})\\ \beta &= P(\text{Type 2 Error})\\ &= P(\frac{\phat - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} < Z_{\alpha} | H_1\text{ is true})\\ \text{Under }H_1\hspace{1cm}&\frac{\phat - p_1}{\sqrt{\frac{p1(1-p_1)}{n}}} \sim N(0,1)\\ \text{So, }\beta &= P(\phat < p_0 + Z_{\alpha}\sqrt{\frac{p_0(1-p0)}{n}} | H_1)\\ &= P(\frac{\phat - p_1}{\sqrt{\frac{p_1(1-p_1)}{n}}} < \frac{p_0 - p_1}{\sqrt{\frac{p1(1-p1)}{n}}} + Z_{\alpha}\sqrt{\frac{p_0(1-p_0)}{p_1(1-p_1)}})\\ &= P(Z < \frac{p_0 - p_1}{\sqrt{\frac{p_1(1-p_1)}{n}}} + Z_{\alpha}\sqrt{\frac{p_0(1-p_0)}{p_1(1-p_1)}})\\ \text{Thus, }n &= p_1(1-p_1)[\frac{Z_{\beta} - Z_{\alpha}\sqrt{\frac{p_0(1-p_0)}{p_1(1-p_1)}}}{p_0 - p_1}]^2 \end{align*}

  • Case 2: 2-sided Test
    In the two-sided test, we reject if |Z_{\obs}| > Z_{\frac{\alpha}{2}}. The calculation for the 2-sided test follows very similarly to the one-sided test, however we change the Z_{\alpha} to Z_{\frac{\alpha}{2}} to reflect that we’re allowing values on both sides of the null hypothesis. The formula for sample size is thusly:

        \begin{equation*} n &= p_1(1-p_1)[\frac{Z_{\beta} - Z_{\frac{\alpha}{2}}\sqrt{\frac{p_0(1-p_0)}{p_1(1-p_1)}}}{p_0 - p_1}]^2 \end{equation*}

    All else remains the same.

    Additional Links