Power analysis is the name given to the process for determining the sample size for a research study. The technical definition of power is that it is the probability of detecting a "true" effect when it exists. Many students think that there is a simple formula for determining sample size for every research situation. However, the reality it that there are many research situations that are so complex that they almost defy rational power analysis. In most cases, power analysis involves a number of simplifying assumptions, in order to make the problem tractable, and running the power analysis numerous times with different variations to cover all of the contingencies.
In this unit we will try to illustrate the logit power analysis process using a simple logistic regression with a single continuous predictor. We will follow up this example with a multiple logistic regression model with five predictors.
A small and very exclusive liberal arts college wishes to do a quantitative analysis of their admission process. Currently, the college uses an admissions committee made up of administrators, faculty and students to admit 70 freshmen each school year. The admissions committee looks at grades, test scores and essays in deciding which students to admit. They do not, however, use any quantitative methods to justify their academic judgment.
The registrar wishes to validate the admissions process using a logistic regression model. The first model the registrar wants to try uses the Verbal SAT to predict the admit/no admit (0/1) response variable. The registrar expects that only 8% of students (probability = .08) who score at the mean will be admitted. He would like to test whether students who score one standard deviation above the mean will be 15% (increase the probability of .08 by .15 to .23) more likely to be admitted.
We will make use of the Stata program powerlog (findit powerlog) (see How can I use the findit command to search for programs and get additional help? for more information about using findit) to do the power analysis for simple logistic regression. The powerlog program needs the following information in order to do the power analysis: 1) the probability of being admitted when scoring at the mean of the Verbal SAT (p1 = .08), 2) the probability of being admitted when scoring one standard deviation above the mean on the Verbal SAT (p2 = .08 + .15 = .23), and 3) the alpha level (alpha = .05 for this example). Since there is only one predictor variable in the model, we will leave the rsq option at zero. We include the help option to display the explanation of the terms in the command.
powerlog, p1(.08) p2(.23) alpha(.05) help Logistic regression power analysis One-tailed test: alpha=.05 p1=.08 p2=.23 p2-p1=.15 rsq=0 power n 0.60 73 0.65 81 0.70 89 0.75 98 0.80 109 0.85 123 0.90 141 Explanation of terms p1 -- the probability that the response variable equals 1 when the predictor is at the mean p2 -- the probability that the response variable equals 1 when the predictor is one standard deviation above the mean rsq -- the squared multiple correlation between the predictor variable and all other variables in the model
The output indicates that 73 observations would be needed to have a power of .6 and that 109 observations are needed for a power of .8. These numbers are probably the bare minimum needed since logistic regression uses maximum likelihood estimation which many researchers believe needs fairly large sample sizes.
The registrar is in luck because he has data on 192 students (70 admitted and 122 not admitted). But what if the registrar wanted to use an alpha level of .01? How many observations would be needed?
powerlog, p1(.08) p2(.23) alpha(.01) Logistic regression power analysis One-tailed test: alpha=.01 p1=.08 p2=.23 p2-p1=.15 rsq=0 power n 0.60 139 0.65 149 0.70 160 0.75 172 0.80 187 0.85 204 0.90 228
The registrar's luck is still holding since with alpha set to .01, the logistic regression would need 187 observation for a power of .8.
Let's go back to alpha = .05 and see what happens if we increase the effect size. In t-tests and ANOVAs, effect size is given in terms of mean differences and standard deviations. In logistic regression effect size can be stated in terms of the probability at the mean of the predictor and the probability at the mean plus one standard deviation. In the first model the probability at the mean was .08 and at the mean plus one standard deviation was .23. To increase the effect size to .2 we leave p1 at .08 and increase p2 to .28.
powerlog, p1(.08) p2(.28) alpha(.05) Logistic regression power analysis One-tailed test: alpha=.05 p1=.08 p2=.28 p2-p1=.2 rsq=0 power n 0.60 117 0.65 127 0.70 138 0.75 151 0.80 165 0.85 183 0.90 206
These results may be different than you were expecting. The sample sizes have actually gone up. The reason for this is the nonlinear nature of logistic regression.
This next analysis demonstrates that it is important to realize that the sample size needed for an effect size of .2 depends upon the p1 value. Let's shift p1 up to .4, keep the effect size at .2 for a p2 of .6 and rerun the previous analysis.
powerlog, p1(.4) p2(.6) alpha(.05) Logistic regression power analysis One-tailed test: alpha=.05 p1=.4 p2=.6 p2-p1=.2 rsq=0 power n 0.60 40 0.65 45 0.70 51 0.75 57 0.80 65 0.85 74 0.90 87
Since this effect size is centered about the probability of .5, i.e., it is in the middle of the probability distribution, a much small sample size is needed than detecting the same size change in the tails of the probability curve. Again, this due to the nonlinear nature of the model in terms of probability.
Now, let's see what happens when we use multiple predictors in the model while still focusing of the effect of Verbal SAT. Let's say that there are five predictors in the model; we will need to specify the squared multiple correlation of Verbal SAT with the other four predictors. We will run two two analyses setting rsq, the squared multiple correlation, to .2 and then to .4. We will stick with the original p1 = .08 and p2 = .23 and alpha of .05.
powerlog, p1(.08) p2(.23) alpha(.05) rsq(.2) Logistic regression power analysis One-tailed test: alpha=.05 p1=.08 p2=.23 p2-p1=.15 rsq=.2 power n 0.60 92 0.65 101 0.70 111 0.75 123 0.80 137 0.85 154 0.90 176 powerlog, p1(.08) p2(.23) alpha(.05) rsq(.4) Logistic regression power analysis One-tailed test: alpha=.05 p1=.08 p2=.23 p2-p1=.15 rsq=.4 power n 0.60 122 0.65 135 0.70 148 0.75 164 0.80 182 0.85 205 0.90 235
As you can see, as the R-squared with the other predictors goes up, the number of observations needed also goes up.
The power analysis for logistic regression looks, on the surface, to be relatively straight forward. However, when you get into it, you might find that it can be difficult to come up with reasonable and meaningful estimates of the two probabilities that are needed. One of the best ways to come up with these probabilities is through a pilot study that closely mimics your research design. The standard suggestion to obtain values from the literature may not work as well with logistic regression because published articles do not always give enough information to determine the probability at both the mean and at one standard deviation above the mean. Many researchers think that all that is necessary is to know the change in probability. However, since logistic regression is a nonlinear model, knowledge of the first probability is necessary since the sample size for a .1 change in probability starting at .2 is larger than a .1 change starting at .5.
It is also necessary to reiterate that the sample sizes generated by powerlog should be considered to be a lower bound. Although the sample sizes provided are valid for hypotheses testing with a specified power, methodologist are not in complete agreement as to how big sample sizes need to be to obtain stable estimates. Long (1997) suggests that sample sizes of less than 100 should be avoided and that 500 observations should be adequate for almost any situation. However, this leaves a relatively large gap between 100 and 500. If powerlog gives a sample size of less than 100, you might want to increase it to at least 100, just to be safe. Even if powerlog suggests an N of say 110, you might want to use an larger sample if you believe that you data might be problematic in any way.
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.