This is something that I see students struggling with time and time again. Most students start by proposing a sample size based on convenience, with no real consideration of anything other than the pragmatics of managing this number. Inevitably, they will be faced with this question (usually in the ethics application process):
“Why have you chosen that sample size?”
Before we attempt to answer this, we need to know why this is an important question.
Quite simply, not including enough participants could lead to a type-II error. More simply, a type-II is when something is not observed when in fact it actually did exist, also known as a false-negative. Basically, there is not enough data to have sufficient confidence that there is a real difference, or correlation. Subsequently, it would be a waste of time for the researcher as well as the participant to undertake a study where it is unlikely that the results will be meaningful.
On the other side, recruiting more participants than necessary will likely consume more resources than necessary, which can have a negative impact on other research that may depend on these resources. However, there is also a potential ethical concern with regards to recruiting more participants than necessary. Participation in research is often voluntary or for a small gratuity. Participants seldom volunteer for research to get rich (except in the case of early-phase clinical trials, of course) and more often, participants volunteer their time because of the belief their contribution will make a difference.
If a study only needs about 30 participants to achieve the aim, and if 100 participants are included, then the time and good will of 70 participants has been wasted. Furthermore, in cases of experimental research designs, there is often a small risk of potential harm.
The pragmatic and ethical strategy here therefore is to recruit enough participants to ensure that the aim of the study is achieved, thereby not wasting anyone’s time, while also ensuring that the smallest number of participants are required to give their time, and expose themselves to potential risk.
Where possible, the a power calculation should be used to inform the sample size. This is often considered the gold-standard, and demonstrates a calculated and considered rationale behind the proposed sample size. Fortunately, Gpower is a free programme and can help to easily and quickly estimate a sample size.
However, there are few things that we must already know to be able to conduct a power calculation (see Input Parameters in the screen shot), or at least be able to estimate. For a t-test, where differences between two groups of data are to be analysed and identified, these are:
1. The expected effect size.
The effect size is simply the magnitude of change. As this is seldom known (as the study has yet to be conducted), estimated may be drawn from earlier similar studies. it is usually expressed as Cohen’s d, and can be crudely estimated by dividing the difference between two groups by the average standard deviation. Fortunately, Gpower includes a little calculator to do this for you.
The alpha value, or the probability of a type-I error (seeing something that doesn’t exist; false-positive), is usually set to 0.05, as this represents a generally accepted level of significance (i.e., most researchers believe that being 95% confident that the effect is real, is acceptable). Whether this is true for your intentions is up to you, however if you are in doubt, go for 0.05 as default.
The power is the probability of not having a type-II error (failing to see something that does exist, or not having a false-negative). Setting this to 0.8 indicates that being 80% certain that any null results are actually null.
4. Allocation ratio
Setting this to 1 is a safe option, unless there are valid reasons to not do this, for example if the perceived risk, or cost of an intervention is very high, then a smaller intervention group may be acceptable.
However, sometimes it can be difficult to estimate the effect size of your intervention if there is little or no research in your area. If you can’t make an estimate based on similar studies then this method might not be suitable.
Quite simply, if previous similar studies have successfully observed outcomes, then there is a sound argument that the sample sized used in these studies was sufficient and provides a good precedent for future studies. This is a well accepted approach to justifying a sample size.
The logic is similar to the power calculation method, in that the previous studies were clearly sufficiently powered to detect a difference, as they did indeed find a difference (there would be no value in basing a sample size from a study which did not find a significant outcome, as it cannot be ruled out that there was a type-II error, or false-negative). If you expect that your study will produce an effect of the same or higher magnitude, then there is no need to have a sample size greater than what has been previous reported. However, if you are unsure, then it is wise to slightly increase the sample to ensure that your study is sufficiently powered, should your effect size be smaller.
On the fly calculations
This is a much less frequently method for identifying a sample size, however, don’t be deterred by the name, as this is a more objective method than might otherwise be assumed.
This method is not simply about continuing to recruit until significance is achieved, as that could be considered as data manipulation. Instead, a more objective approach is to use the confidence interval as an indication as to whether or not a sufficient number of participants has been included to be representative of the population (or at least, where the mean of that population might actually be). For example, a measure with a very diverse range among the population (for example, income, which can vary considerably from zero to hundreds of thousands) will require a great number of participants if a difference of 30% is to be detected between two groups. However, a measure that is very narrow among a population (for example, height, which will rarely vary by more than 50%) will require fewer participants if a difference of 30% is to be detected between groups.
As you may not yet know how much your outcome measure will vary in your sample, and therefore cannot possibly estimate an effect size (which is affected by variability), then an elegant solution is to use the confidence interval as a threshold to indicate that you have ‘enough’ participants.
Will Hopkins published this elegant approach in 1997.
When all else fails and there are no fancy equations or previous research to rely on then your only reasonable options is to rationalise a “best guess” that fits within a realistic and practical range of being between enough and not too many. Sometimes this comes down to an argument for what is practical in terms of time frames and resources.
If necessary, this can be frames as a pilot, as a more informed sample size can be calculated after the completion of the study, once an effect size can be estimated. In this case, be prepared to accept the possibility of being under-powered and accepting that there may have been a type-II error (false-negative).
Regardless of your approach, the important thing is that this is carefully considered and reasonable. Show that this factor has received considerable attention and is not a mindless guess.
Also, remember to include an estimate of attrition (aka., “drop-outs”).
The longer the duration, and the more work required by the participants, the greater the attrition is likely to be.
If in doubt, 10% is a good rule of thumb that is generally accepted for small and simple longitudinal studies.