REDUCING MEASUREMENT ERROR
IN INFORMAL SECTOR SURVEYS
Mr. Zia ABBASI
Australian Bureau of Statistics
The purpose of this paper is to discuss some of the causes of measurement error and suggest possible ways to minimise or eliminate these errors from informal sector surveys. There are a number of possible causes of measurement error, ranging from the reputation and legislative backing of the national statistical agency through to errors associated with the survey vehicle and associated processes and-procedures. This paper focuses on where measurement errors are due to inadequate survey design and collection processes.
Causes of measurement error
2 In principle, every operation of a survey is a potential source of measurement error. Some examples of causes of measurement error are non-response, badly designed questionnaires, respondent bias and processing errors. The sections that follow discuss the different causes of measurement errors.
3 Measurement errors can be grouped into two main causes, systematic errors and random errors. Systematic error (called bias) makes survey results unrepresentative of the target population by distorting the survey estimates in one direction. For example, if the target population is the entire population in a country but the sampling frame is just the urban population, then the survey results will not be representative of the target population due to systematic bias in the sampling frame. On the other hand, random error can distort the results on any given occasion but tends to balance out on average. Some of the types of measurement error are outlined below:
Failure to identify the target population
4 Failure to identify the target population can arise from the use of an inadequate sampling frame, imprecise definition of concepts, and poor coverage rules. Problems can also arise if the target population and survey population do not match very well. Failure to identify and adequately capture the target population can be a significant problem for informal sector surveys. While establishment and population censuses allow for the identification of the target population, it is important to ensure that the sample is selected as soon as possible after the census is taken so as to improve the coverage of the survey population.
5. Non-respondents may differ from respondents in relation to the attributes/variables being measured. Non-response can be total (where none of the questions were answered) or partial (where some questions may be unanswered owing to memory problems, inability to answer, etc.). To improve response rates, care should be taken in training interviewers, assuring the respondent of confidentiality, motivating him or her to cooperate, and revisiting or calling back if the respondent has been previously unavailable. 'Call backs' are successful in reducing non-response but can be expensive. It is also important to ensure that the person who has the information required can be contacted by the interviewer; that the data required are available and that an adequate follow up strategy is in place
6 The content and wording of the questionnaire may be misleading and the layout of the questionnaire may make it difficult to accurately record responses. Questions should not be misleading or ambiguous, and should be directly relevant to the objectives of the survey. In order to reduce measurement error relating to questionnaire design, it is important to ensure that the questionnaire:
· can be completed in a reasonable amount of time;
· can be properly administered by the interviewer;
· uses language that is readily understood by both the interviewer and the respondent; and
· can be easily processed.
7 In designing questionnaires and training interviewers in the case of informal sector survey where there is a strong potential for inaccurate information being provided by respondents, consideration should be given to the use of random question sequencing, derived or imputed results, and the use of partial questionnaires. The random question sequencing approach involves the interviewer asking the survey respondent a number of questions about the relevant data items (e.g. input costs and quantities, output prices and output units sold) in a random order. The interviewer would use a deck of questionnaire cards. The cards would be shuffled and then the interviewer would ask a series of questions out of sequence, record each answer and then reassemble the questions in the right sequence to get the final response (e.g. profit or value added information) as a derived result. Another approach-to consider;-where particular-responding businesses form a reasonably homogeneous group operating with similar cost structures and market conditions, is aggregating results from sample measures of inputs and outputs. This approach involves using separate but representative random samples of businesses to collect information about different data items. The data are then brought together to produce imputed aggregate level estimates.
8. The respondent answers questions can be influenced by the interviewer's behaviour, choice of clothes, sex, accent and prompting when a respondent does not understand a question. A bias may also be introduced if interviewers receive poor training as this may have an affect on the way they prompt for, or record, the answers. The best way to minimise interviewer bias is through effective training and by ensuring manageable workloads.
9 Training can be provided in the form of manuals, formal training courses on questionnaire content and interviewing techniques, and on-the-job training in the field. Topics that should be covered in interviewer training include - the purpose of the survey; the scope and coverage of the survey; a general outline of the survey design and sampling approach being used; the questionnaire; interviewing techniques and recording answers; ways to avoid or reduce non-response; how best to maintain respondent co-operation; field practice; quality assurance and editing of data; planning workloads; and administrative arrangements.
10 Refusals and inability to answer questions, memory biases and inaccurate information will lead to a bias in the estimates. An increasing level of respondent burden (due to the number of times a person is included in surveys) can also make it difficult to get the potential respondent to participate in a survey. When designing a survey it should be remembered that uppermost in the respondent's mind will be protecting their own personal privacy, integrity and interests. Also, the way the respondent interprets the questionnaire and the wording of the answer the respondent gives can cause inaccuracies to enter the survey data. Careful questionnaire design, effective training of interviewers and adequate survey testing can overcome these problems to some extent.
11 There are four stages in the processing of the data where errors may occur: data grooming, data capture, editing and estimation. Data grooming involves preliminary checking before entering the data onto the processing system in the capture stage. Inadequate checking and quality management at this stage can introduce data loss (where data are not entered into the system) and data duplication (where the same data are entered into the system more than once). Inappropriate edit checks and inaccurate weights in the estimation procedure can also introduce errors to the data at the editing and estimation stage. To minimise these errors, processing staff should be given adequate training and realistic workloads. Training material for processing staff should cover similar topics to those for interview staff, however, with greater emphasis on editing techniques and quality assurance practices.
12 There are five main editing checks that should be considered including structure checks, range edits, sequencing checks, checks for duplication and omissions, and logic edits. Structure checks are undertaken to ensure that all the information sought has been provided. This involves checking that all documents for a record are together and correctly labelled. Range edits are used to ensure that only the possible codes for each question are used and that no codes outside the valid range has been entered. Sequencing checks involve the process of ensuring that all those who should have answered the question (because they gave a particular answer to earlier question) have done so and that respondents who should not have answered the question did not do so. Duplication and omission checks ensure that the specific data reported by a respondent has not been recorded more than once or that data reported has not been omitted. Logic edits involve specifying checks in advance to data collection. An example of a logic edit would be that males cannot report that they are pregnant.
13 The key areas that an effective editing strategy should address to reduce processing errors are:
· target the editing effort to large contributors and large units within the survey population;
· do not over edit the data;
· automate the editing process as far as possible; and
· feedback information from the data processing stage to refine the conduct of the survey through changes such as improvements in question wording, questionnaire design, training and instructions.
Misinterpretation of results
14 This can occur if the researcher is not aware of certain factors that influence the characteristics under investigation. A researcher or any other user not involved in the data collection process may be unaware of trends built into the data due to the nature of the collection (e.g. where interviews are always conducted at a particular time of the weekday could result in only particular types of householders being interviewed). Researchers should carefully investigate the methodology used in any given survey.
15 Non-response results when data are not collected from respondents. The proportion of these non-respondents in the sample is called the non-response rate. It is important to make all reasonable efforts to maximise the response rate as non-respondents may have differing characteristics to respondents. Significant non-response can bias the survey results. When a respondent replies to the survey answering some but not all questions then it is called partial non-response. Partial non-response can arise due to memory problems, inadequate information or an inability to answer a particular question. The respondent may also refuse to answer questions if they find questions particularly sensitive; or have been asked too many questions (the questionnaire is too long). Total non-response can arise if a respondent cannot be contacted (the frame contains inaccurate or out-of-date contact information or the respondent is not at home), is unable to respond (may be due to language difficulties or illness) or refuses to answer any questions.
16 Response rates can be improved through good survey design via short, simple questions, good forms design techniques and by effectively explaining survey purposes and uses. Assurances of confidentiality are very important as many respondents are unwilling to respond due to privacy concerns. For informal sector surveys, it is essential to ensure that the survey is directed to the person within the establishment or household who can provide the data sought. Call backs for those not available and follow-ups can increase response rates for those who, initially, were unable to reply. Refusals can be minimised through the use of positive language; contacting the right person who can provide the information required; explaining how and what the interviewer plans to do to help with completing the questionnaire; stressing the importance of the survey and the authority under which the survey is being conducted; explaining the importance of their response as being representative of other units; emphasising the benefits from the survey results for the individual and/or broader community; giving adequate assurances of the confidentiality of the responses; and finding out the reasons for their reluctance to participate and trying to talk through the areas of concern.
17 Other measures that can improve respondent cooperation and maximise response include public awareness activities including discussions with key organisations and interest groups, news releases, media interviews and newspaper articles this is aimed at informing the community about the survey, identifying issues of concern and addressing them; and where possible, using a primary approach letter, which gives respondents advance notice and explains the purposes of the survey and how the survey will be conducted.
18 In case of a mail survey most of the points above can be stated in an introductory letter or through a publicity campaign. Other non-response minimisation techniques which could be used in a mail survey include providing a postage-paid mail-back envelope with the survey form; and reminder letters.
19 Where non-response is at an unsatisfactory level after all reasonable attempts to follow-up are undertaken, bias can be reduced by imputation for item non-response (non-response to a particular question) or imputation for unit non-response (complete non-response for a unit). The main aim of imputation is to produce consistent data without going back to the respondent for the correct values thus reducing both respondent burden and costs associated with the survey. Broadly speaking the imputation methods fall into three groups - the imputed value is derived from other information supplied by the unit; values by other units can be used to derive a value for the non-respondent (e.g. average); and an exact value of another unit (called donor) is used as a value for the non-respondent (called recipient).
20 When deciding on the method for non-response imputation it is desirable to know what effect imputation will have on the final estimates. If a large amount of imputation is performed the results can be misleading particularly if the imputation used distorts the distribution of data. If at the planning stage it is believed that there is likely to be a high non-response rate, then the sample size could be increased to allow for this. However, the problem may not be overcome by just increasing the sample size, particularly if the non-responding units have different characteristics to the responding units. Imputation also fails to totally eliminate non-response bias from the results.
21 If a low response rate is obtained, estimates are likely to be biased and therefore misleading. Determining the exact bias in estimates is difficult. However, an indication can be obtained by - comparing the characteristics of respondents to non-respondents; comparing results with alternative sources and/or previous estimates; and performing a post-enumeration survey on a sub-sample of the original sample with intensive follow-up of non-respondents.
22 Adjusting the weights so they sum to population is referred to as benchmarking. Benchmarking is often used in the ABS to ensure that population surveys are consistent with results from the Population Census. In particular, the ABS benchmarks sex and age breakdowns. Benchmarking will reduce the effect of non-response bias from estimates, although it will not remove all of the effect. In some cases, the achieved sample may not accurately represent the population. This could occur due to the random selection of the sample or due to differing response rates for separate population groups. We can use information from other sources to create a more accurate description of the population. For example, sample of persons in a community are selected for a survey. The results show that 30% of the respondents were males and 70% were females. However, the community council records show that the actual population is roughly 50% males and 50% females. It is therefore highly probable that estimates produced from the sample would not accurately reflect the entire community. To create more accurate estimates, there would need to be an adjustment of the weights of the respondents used to derive the estimates, so that they add up to the population total. In this example, the males weight would be increased while the females weight would be reduced.
23 In conclusion, while measurement error may be difficult to measure accurately it can be minimised by:
• careful selection of the time the survey is conducted;
• using an up-to-date, accurate sample framework;
• revisiting or conducting 'call backs' to unavailable respondents;
• careful questionnaire design;
• providing thorough training for interviewers and processing staff; and
• being aware of all the factors affecting the topic under investigation. August 2000