REDUCING MEASUREMENT ERROR
IN INFORMAL
SECTOR SURVEYS
Mr.
Zia ABBASI
Regional Director
Australian Bureau of Statistics
The purpose of this paper is to discuss some of the
causes of measurement error and suggest possible ways to minimise or eliminate these
errors from informal sector surveys. There are a number of possible
causes of measurement error, ranging from the reputation and legislative
backing of the national statistical agency through to errors associated with the
survey vehicle and associated processes and-procedures. This
paper focuses on where measurement errors are due to inadequate survey design
and collection processes.
Causes of
measurement error
2 In principle, every operation of a
survey is a potential source of measurement error. Some examples of causes of
measurement error are non-response, badly designed questionnaires,
respondent bias and processing errors. The sections that follow
discuss the different causes of measurement errors.
3 Measurement errors can be grouped into
two main causes, systematic errors and random errors. Systematic error (called
bias) makes survey results unrepresentative of the target population by
distorting the survey estimates in one direction. For example, if the target
population is the entire population in a country but the sampling frame is just
the urban population, then the survey results will not be representative of the
target population due to systematic bias in the sampling frame. On the other
hand, random error can distort the results on any given occasion but
tends to balance out on average. Some of the types of measurement error are outlined
below:
Failure to
identify the target population
4 Failure to identify the target
population can arise from the use of an inadequate sampling frame, imprecise
definition of concepts, and poor coverage rules. Problems can also
arise if the target population and survey population do not match very well.
Failure to identify and adequately capture the target population can
be a significant problem for informal sector surveys. While establishment and population
censuses allow for the identification of the target population, it is important
to ensure that the sample is selected as soon as possible after the census is taken
so as to improve the coverage of the survey population.
Non-response bias
5. Non-respondents may differ from
respondents in relation to the attributes/variables being measured.
Non-response can be total (where none of the questions were answered) or
partial (where some questions may be unanswered owing to
memory problems, inability to answer, etc.). To improve response rates, care should be taken in training
interviewers, assuring the respondent of confidentiality, motivating him or her to cooperate, and
revisiting or calling back if the respondent has been previously unavailable.
'Call backs' are successful in reducing non-response but can be expensive. It is also important to
ensure that the person who has the information required can be contacted by the
interviewer; that the data required are available and that an adequate follow
up strategy is in place
Questionnaire
design
6 The content and wording of the
questionnaire may be misleading and the layout of the questionnaire
may make it difficult to accurately record responses. Questions should not be
misleading or ambiguous, and should be directly relevant to the objectives of
the survey. In order to reduce measurement error relating to questionnaire
design, it is important to ensure that the questionnaire:
·
can be completed in a reasonable amount of time;
·
can be properly administered by the interviewer;
·
uses language that is readily understood by both the
interviewer and the respondent; and
·
can be easily processed.
7 In designing questionnaires and
training interviewers in the case of informal sector survey where there is a
strong potential for inaccurate information being provided by respondents,
consideration should be given to the use of random question sequencing,
derived or imputed results, and the use of partial questionnaires. The random
question sequencing approach involves the interviewer asking the survey
respondent a number of questions about the relevant data items (e.g.
input costs and quantities, output prices and output units sold) in a random
order. The interviewer would use a deck of questionnaire cards. The cards would
be shuffled and then the interviewer would ask a series
of questions out of sequence, record each answer and then reassemble the
questions in the right sequence to get the final response (e.g. profit or value
added information) as a derived result. Another approach-to
consider;-where particular-responding businesses form a reasonably
homogeneous group operating with similar cost structures and market conditions,
is aggregating results from sample measures of inputs and outputs. This
approach involves using separate but representative random samples of
businesses to collect information about different data items. The data are then
brought together to produce imputed aggregate level estimates.
Interviewer bias
8. The respondent answers questions can be
influenced by the interviewer's behaviour, choice of clothes, sex, accent and
prompting when a respondent does not understand a question. A bias may
also be introduced if interviewers receive poor training as this
may have an affect on the way they prompt for, or record, the answers. The best
way to minimise interviewer bias is through effective training and by ensuring
manageable workloads.
9 Training can be provided in the form
of manuals, formal training courses on questionnaire content and
interviewing techniques, and on-the-job training in the field. Topics that
should be covered in interviewer training include - the purpose of the
survey; the scope and coverage of the survey; a general outline of the survey design
and sampling approach being used; the questionnaire; interviewing techniques
and recording answers; ways to avoid or reduce non-response; how best to
maintain respondent co-operation; field practice; quality assurance and editing
of data; planning workloads; and administrative arrangements.
Respondent bias
10 Refusals and inability to answer
questions, memory biases and inaccurate information will lead to a
bias in the estimates. An increasing level of respondent burden
(due to the number of times a person is included in surveys) can also make it
difficult to get the potential respondent to participate in a survey. When
designing a survey it should be remembered that uppermost in the
respondent's mind will be protecting their own personal privacy,
integrity and interests. Also, the way the respondent interprets the
questionnaire and the wording of the answer the respondent gives can cause
inaccuracies to enter the survey data. Careful questionnaire design,
effective training of interviewers and adequate survey testing can
overcome these problems to some extent.
Processing errors
11 There are four stages in the processing
of the data where errors may occur: data grooming, data capture, editing and
estimation. Data grooming involves preliminary checking before entering the
data onto the processing system in the capture stage. Inadequate checking and
quality management at this stage can introduce data loss (where data are not
entered into the system) and data duplication (where the same data are entered
into the system more than once). Inappropriate edit checks and inaccurate
weights in the estimation procedure can also introduce errors to the data at
the editing and estimation stage. To minimise these errors, processing staff
should be given adequate training and realistic workloads. Training
material for processing staff should cover similar topics to those for
interview staff, however, with greater emphasis on editing techniques and quality
assurance practices.
12 There are five main editing checks that
should be considered including structure checks, range edits, sequencing
checks, checks for duplication and omissions, and logic edits. Structure checks
are undertaken to ensure that all the information sought has been provided.
This involves checking that all documents for a record are together and
correctly labelled. Range edits are used to ensure that only the possible codes
for each question are used and that no codes outside the valid
range has been entered. Sequencing checks involve the process of ensuring that
all those who should have answered the question (because they gave a particular
answer to earlier question) have done so and that respondents who should not
have answered the question did not do so. Duplication and omission checks
ensure that the specific data reported by a respondent has not been recorded
more than once or that data reported has not been omitted. Logic edits involve specifying
checks in advance to data collection. An example of a logic edit would be that
males cannot report that they are pregnant.
13 The key areas that an effective editing
strategy should address to reduce processing errors are:
·
target the editing effort to large contributors and
large units within the survey population;
·
do not over edit the data;
·
automate the editing process as far as possible; and
·
feedback information from the data processing stage to
refine the conduct of the survey through changes such as improvements in
question wording, questionnaire design, training and instructions.
Misinterpretation
of results
14 This can occur if the researcher is not
aware of certain factors that influence the characteristics under
investigation. A researcher or any other user not involved in the data
collection process may be unaware of trends built into the data due to the nature
of the collection (e.g. where interviews are always conducted at a particular time
of the weekday could result in only particular types of householders being
interviewed). Researchers should carefully investigate the methodology used in
any given survey.
Non-response
15 Non-response results when data are not
collected from respondents. The proportion of these non-respondents in the
sample is called the non-response rate. It is important to make all
reasonable efforts to maximise the response rate as non-respondents may have
differing characteristics to respondents. Significant non-response can bias the
survey results. When a respondent replies to the survey answering some but not
all questions then it is called partial non-response. Partial non-response can
arise due to memory problems, inadequate information or an inability to answer
a particular question. The respondent may also refuse to answer questions if
they find questions particularly sensitive; or have been asked too many questions
(the questionnaire is too long). Total non-response can arise if a respondent
cannot be contacted (the frame contains inaccurate or out-of-date contact information
or the respondent is not at home), is unable to respond (may be due to language
difficulties or illness) or refuses to answer any questions.
16 Response rates can be improved through
good survey design via short, simple questions, good forms design
techniques and by effectively explaining survey purposes and uses.
Assurances of confidentiality are very important as many respondents are
unwilling to respond due to privacy concerns. For informal sector surveys, it
is essential to ensure that the survey is directed to the person within the
establishment or household who can provide the data sought. Call backs for
those not available and follow-ups can increase response rates
for those who, initially, were unable to reply. Refusals can be minimised
through the use of positive language; contacting the right person who
can provide the information required; explaining how and what the interviewer plans
to do to help with completing the questionnaire; stressing the importance of
the survey and the authority under which the survey is being
conducted; explaining the importance of their response as being representative
of other units; emphasising the benefits from the survey results for the
individual and/or broader community; giving adequate assurances of the confidentiality
of the responses; and finding out the reasons for their reluctance to participate
and trying to talk through the areas of concern.
17 Other measures that can improve
respondent cooperation and maximise response include public awareness
activities including discussions with key organisations and interest
groups, news releases, media interviews and newspaper articles this is aimed at
informing the community about the survey, identifying issues of concern and
addressing them; and where possible, using a primary approach letter, which
gives respondents advance notice and explains the purposes of the
survey and how the survey will be conducted.
18 In case of a mail survey most of the
points above can be stated in an introductory letter or through a publicity
campaign. Other non-response minimisation techniques which could be used
in a mail survey include providing a postage-paid mail-back envelope with the
survey form; and reminder letters.
19 Where non-response is at an
unsatisfactory level after all reasonable attempts to follow-up are
undertaken, bias can be reduced by imputation for item non-response
(non-response to a particular question) or imputation for unit non-response
(complete non-response for a unit). The main aim of imputation is to produce
consistent data without going back to the respondent for the correct values
thus reducing both respondent burden and costs associated with the survey.
Broadly speaking the imputation methods fall into three groups - the imputed
value is derived from other information supplied by the unit; values by other
units can be used to derive a value for the non-respondent (e.g.
average); and an exact value of another unit (called donor) is used as a
value for the non-respondent (called recipient).
20 When deciding on the method for
non-response imputation it is desirable to know what effect imputation
will have on the final estimates. If a large amount of imputation is performed
the results can be misleading particularly if the imputation used distorts the
distribution of data. If at the planning stage it is believed that there is likely
to be a high non-response rate, then the sample size could be increased to
allow for this. However, the problem may not be overcome by just increasing the
sample size, particularly if the non-responding units have different
characteristics to the responding units. Imputation also fails to totally
eliminate non-response bias from the results.
21 If a low response rate is obtained,
estimates are likely to be biased and therefore misleading. Determining the exact
bias in estimates is difficult. However, an indication can be
obtained by - comparing the characteristics of respondents to non-respondents;
comparing results with alternative sources and/or previous estimates;
and performing a post-enumeration survey on a sub-sample of the original sample
with intensive follow-up of non-respondents.
Benchmarking
22 Adjusting the weights so they sum to
population is referred to as benchmarking. Benchmarking is often used in
the ABS to ensure that population surveys are consistent with results from the
Population Census. In particular, the ABS benchmarks sex and age breakdowns.
Benchmarking will reduce the effect of non-response bias from
estimates, although it will not remove all of the effect. In some cases, the
achieved sample may not accurately represent the population. This could
occur due to the random selection of the sample or due to differing response
rates for separate population groups. We can use information from other sources
to create a more accurate description of the population.
For example, sample of persons in a community are selected for a
survey. The results show that 30% of the respondents were males and
70% were females. However, the community council records show that the actual
population is roughly 50% males and 50% females. It is therefore highly probable
that estimates produced from the sample would not accurately reflect the
entire community. To create more accurate estimates, there would
need to be an adjustment of the weights of the respondents used to derive the estimates,
so that they add up to the population total. In this example, the males weight
would be increased while the females weight would be reduced.
Conclusion
23 In conclusion, while measurement error
may be difficult to measure accurately it can be minimised by:
• careful
selection of the time the survey is conducted;
• using an
up-to-date, accurate sample framework;
• revisiting
or conducting 'call backs' to unavailable respondents;
• careful
questionnaire design;
• providing
thorough training for interviewers and processing staff; and
• being aware of
all the factors affecting the topic under investigation. August
2000