Find information about
Related documents
Guide to How We Regulate
(17/11/2008)
Guide to how we regulate page more...
Governance Matters
(01/09/2008)
Governance Matters Report (PDF) more...
Performance Standards
(14/03/2008)
Performance Standards page more...
Section 4 Techniques and Tools
4.09 Random or probability sampling
If a statistically representative survey is required from which inferences can be made about the wider population, the survey must be based on a random (probability) sample. See Section 3 for a fuller explanation.
In a sampling sense, random does not mean arbitrary which is the popular use of the term. A random sample is selected in a systematic way that gives every member of the research population a known chance or probability of selection, which is not zero.
Random sampling is almost never used in qualitative research based on interviews and participant observation. Most qualitative research is concerned with generalising research findings to theory rather than to populations so quantitative concerns about representativeness and random sampling are less relevant for qualitative researchers. Statistical representation is most likely to be necessary in a large-scale general satisfaction survey, a housing needs or organisation-wide survey.
Sampling frames
Selecting a random sample requires a sampling frame. This is a list of all the members of the population. Examples of lists are addresses from property files, the Electoral Register, the Council Tax register, a specially compiled sample frame such as a list of all recent service users, or a list developed from multi-stage cluster sampling. Many large-scale general population surveys select a probability sample through use of the Post Office Address File or PAF. Whilst not strictly a sample frame in itself, the PAF has good coverage and is reliable; it can be purchased, or samples commissioned through agencies for use in general surveys.
Surveys that are based on multi-stage cluster samples, samples drawn from the PAF or where the views of a random adult member of the household (rather than the pre-defined tenant or head of household) are required, should select the individual through a further procedure based on chance, not convenience. This could be a procedure based on birthday dates or the use of a Kish Grid. The Kish Grid or similar procedure should also be used to select a flat at an address where there is more than one dwelling.
It is important to remember that random sampling cannot be done with an inadequate sampling frame because the resulting sample will not be an accurate reflection of the population. To be adequate, sampling frames should be up-to-date before selecting the sample. They should include all population groups that the research is interested in - where there is likely to be systematic under-enumeration of particular groups of people, such lists should not be used for random sample selection. For example, certain sample frames systematically exclude particular service users, such as those who are not tenants, the homeless, those not registered to vote, young people and so on. Sampling frames should also be examined for inherent bias - there may be some underlying ordering to a sample frame that means that a certain type of person or property may occur at regular intervals throughout the listing. Such sample frames should be rearranged before use or simple random sampling rather than systematic sampling should be used.
Types of random samples
There are four main types of random or probability samples:
• simple random samples
• systematic samples
• stratified random samples
• multi-stage clustered samples.
Details of when to use these samples and how to obtain them can be found elsewhere.
Sample size
How big a sample should be depends on how accurate it needs to be and the degree to which diversity in the population on important variables for the research needs to be taken into account. Increasing the size of the sample is likely (though not guaranteed) to increase the accuracy with which it reflects the whole population. However, once a sample goes over approximately 1,000 people (or whatever unit is being sampled), increases in accuracy tend to slow down. The costs of obtaining larger samples are not worth it in terms of the extra accuracy achieved. So, although it seems counter-intuitive, above a certain sample size it is the absolute size of the sample that is important, not whether the sample is a certain percentage of the overall population.
Another factor in decisions about sample size is that the size of the sample should be big enough to capture the anticipated diversity of responses to key variables in the research. For example, it may be anticipated that 70 per cent of respondents will say they are satisfied and 30% will say they are dissatisfied. Because these ‘splits’ in response rates are difficult to assess in advance across a range of variables, it is best to play safe by determining sample size on the basis of the key variables for which there is likely to be the greatest diversity of responses.
Figure 4.3 shows the sample sizes that are needed to accommodate the anticipated response splits to key variables. Column one shows the percentage of sampling error at the 95 per cent confidence level. Sampling error is the difference between the sample and the population from which it is drawn and the confidence level of 95 per cent refers to the degree of confidence that there is in the generalisations that can be made on the basis of the sample. For example, if it was found that in a sample of 1100, 82 per cent of the sample were satisfied with services, we could be 95 per cent confident that between 79-85 per cent of the population from which the sample was drawn are satisfied (that is, 82 per cent plus or minus 3 per cent). Note that this figure of 3 per cent is quoted from Figure 4.3 in relation to a sample of 1100. The second column assumes that the anticipated variation within the population in relation to key characteristics of the study is such that there will be roughly a 50:50 split. The third column shows that sample sizes would be smaller for more homogenous samples.
Figure 4.3 Sample Sizes required for various sampling errors at 95 per cent confidence level (simple random sampling)
Sampling error % |
Sample size - 50/50 split |
Sample size - 30/70 split |
1.0 |
10000 |
9600 |
1.5 |
4500 |
|
2.0 |
2500 |
2100 |
2.5 |
1600 |
|
3.0 |
1100 |
933 |
3.5 |
816 |
|
4.0 |
625 |
525 |
4.5 |
494 |
|
5.0 |
400 |
336 |
5.5 |
330 |
|
6.0 |
277 |
233 |
6.5 |
237 |
|
7.0 |
204 |
171 |
7.5 |
178 |
|
8.0 |
156 |
131 |
8.5 |
138 |
|
9.0 |
123 |
104 |
9.5 |
110 |
|
10.0 |
100 |
84 |
How to tell if the sample is biased
Discussions about sample size are based on the number of responses actually obtained from the sample rather than the selected sample number. The absence of certain members of the research population through non-response will reduce sample size and may bias the sample.
The response rate is an important statistic. It can indicate whether there is bias in the achieved sample. For example, if the selected sample number was 500 people and only 50 people responded, it is likely that this coverage of the population is not great enough to provide an accurate reflection of the diversity of opinions. So the achieved 10 per cent sample is likely to be a biased sample.
Response rates can only be calculated where there is a sample frame of all those with a chance of being selected for interview. Many sampling techniques that are not strictly probability samples do not use a sampling frame, so the sample is not chosen at random and therefore it cannot be known what proportion of those that have been selected have not responded.
Where no response rate is able to be calculated, the achieved sample should be compared against known characteristics of the population such as area, property type, age or gender to see whether some groups are over or under-represented in the final sample.
Details of response rates should be given wherever possible and there should be acknowledgement of any under or over representation and correction of bias by weighting of the responses in the analysis. Weighting is a way of adjusting the sample so that the sample profile on key variables reflects that of the population, by statistically increasing or decreasing the numbers of cases with particular characteristics. As a rule, only stable independent variables that do not change rapidly over time should be used for weighting purposes. For example, key variables based on the known distribution of properties or tenure breakdown are likely to be more stable and independent than variables based on service users’ opinions.
Practice point
• Make sure that the technical details of sampling are included in reports provided by contractors, perhaps in a technical annex. This should cover sampling strategy, size, response rates, sampling errors and confidence levels. This is important information in assessing the quality of the achieved probability sample and hence the representativeness of the survey.
Random or probability sampling: checklist
√ If statistical representativeness is the goal of the survey ensure a random or probability sample is used.
√ Ensure that the sampling frame used is adequate.
√ Check that the size of sample is appropriate for the purposes of the survey and type of analysis of sub-groups required.
√ Ensure technical information about sampling has been provided.
√ Consider whether the achieved sample is biased and whether the response rate is typical of the type of survey.
√ Check for evidence of under or over representation of certain sub-groups or areas and whether this has been corrected in the analysis by the use of weights.


How we work