Registration Help

Aisha Faquir / World Bank
 

Intervention Timing

An intervention is considered to start when the treatment begins (or alternatively, when participants are enrolled in the study, which may be earlier). For randomized evaluations, the intervention start date is clearly defined, as it is under the control of the research team or the agency with which the team is working. In some cases, the intervention being evaluated is an extension of an existing program (e.g., adding a new program component or targeting a different population or area). In this case, it is the extension that is considered the “intervention” for current purposes, so indicate the starting date for the extension as the intervention start date. Many non-experimental evaluations will be studying programs that have been in operation for some time or that even have ended. In such cases, indicate the date the program began if this is known.

 

Outcomes (Endpoints)

Most evaluation studies include both primary and secondary outcomes, and many also distinguish between final and intermediate outcomes. A primary outcome or endpoint are the prespecified outcomes that are considered to be the most important effect of the intervention and are the variables used in power calculations to determine sample size. Secondary outcomes are additional outcomes of interest. However, the study is not necessarily powered to detect impacts for secondary outcomes. For example, in an evaluation of microcredit intervention, primary outcomes may be household business activity and household income. Secondary outcomes could be child school enrolment, time allocation, and female empowerment.

Final outcomes are those which represent the ultimate objective of the intervention, while intermediate outcomes are usually defined as outcomes which are links in the causal chain from the intervention to the final outcomes. Intermediate outcomes may not always be of direct interest, but they have to occur for the final outcomes to materialize. For example, to test the impacts on student learning of a program that trains teachers in new pedagogical techniques, an intermediate outcome would be a measure of whether teachers are using the new approach, and the final outcome is student skills measured by tests. For a study of the effects of fertilizer provision on farm productivity, intermediate outcomes could be the proper application of fertilizer by farmers, while final outcomes would be yields and farm incomes. There is usually a temporal sequence from intermediate to final outcomes in which the former occurs before the latter. In some longer term studies, the intermediate outcome is the same endpoint as the final outcome but is measured midway or at some other point between the intervention and final measurement.

 

Hypotheses

Primary hypotheses usually refer to the predictions to be tested about the main impacts of the program being evaluated involving the primary outcome measures. Secondary hypotheses are subsidiary hypotheses that are conditioned on one of the primary hypotheses. For example, for an evaluation of a business skills training program for owners of small businesses, the primary hypothesis might be that participating entrepreneurs experienced greater growth and revenues than non-participants. Secondary hypotheses might be that better-educated participants experienced greater gains or that men experienced greater gains than women. “Secondary hypothesis” can also denote hypotheses to be tested about secondary outcomes. A secondary outcome of the business training program could be use of business credit, so a secondary hypothesis would be that business loans increased as a result of the program.

 

Analysis Plan

A pre-analysis plan is a detailed description of the analysis to be conducted that is written in advance of seeing the data on impacts of the program being evaluated. It may specify hypotheses to be tested, variable construction, equations to be estimated, controls to be used, and other aspects of the analysis. A key function of the pre-analysis plan is to increase transparency in the research. By setting out the details in advance of what will be done and before knowing the results, the plan guards against data mining and specification searching. Researchers are encouraged to develop and upload such a plan with their study registration, but it is not required for registration.

For downloadable examples of recent comprehensive pre-analysis plans, click here, here, and here. (We thank the authors for permission to make these documents available to RIDIE users.) What follows is a “checklist” and description of components that would be included in a comprehensive pre-analysis plan. It is a modified version of the checklist written by David McKenzie that appeared on the World Bank’s Development Impact Blog on 10/28/2012:

  1. Description of the sample to be used in the study. This should include discussion of how the sample was obtained and what the expected sample size is. For a randomized experiment, there should be a description of how the randomization was or will be done and what variables will be included in tests of randomization balance. For other designs, there should be a description of how treatment status or program participation was determined.
  2. Key data sources. Discussion of the key sources of data for the study, including which surveys are planned and what types of administrative data, if any, you expect to use.
  3. Hypotheses to be tested throughout the causal chain. This should specify the key outcomes of interest, the steps along the causal chain to be measured, and the subgroup or heterogeneity analysis that is to be done as well as the hypotheses that accompany each of these tests. These should be as specific as possible and link each outcome specifically to how it will be measured. For example, rather than simply saying the outcome will be “employment,” you should write that the “outcome will be employment, as measured by question D21 on the follow-up questionnaire which asks whether the individual currently works for 20 hours or more per week.”
  4. Specify how variables will be constructed. This includes information such as whether logs or levels of particular variables will be used, how missing variables will be handled, what procedures will be used to deal with outliers, etc. For example, “hours worked per week in last month employed will be measured by question D25 on the follow-up survey. This will be coded as zero for individuals who are not currently working; this will be top-coded at 100 hours per week (99th percentile of baseline response) to reduce influence of outliers. No imputation for missing data from item non-response at follow-up will be performed. We will check whether item non-response is correlated with treatment status following the same procedures as for survey attrition, and if it is, construct bounds for our treatment estimates that are robust to this.”
  5. Specify the treatment effect equation to be estimated. For example, is a difference-in-differences, ANCOVA, or post-specification to be used? What controls will be included in the regression? How will standard errors be calculated? The exact equation to be estimated should be written out.
  6. What is the plan for dealing with multiple outcomes and multiple hypothesis testing? There are a number of methods for dealing with multiple hypothesis testing and the problem that one is more likely to find one or more significant impacts simply by testing a larger number of variables. These methods typically involve one of two approaches. The first method aggregates different measures into a single index and tests for impacts of the treatment on the index, that is, it tests for a “global effect” of the intervention. In this case, the pre-analysis plan should specify precisely which variables will be included in this aggregate. The second approach retains the focus on individual outcomes but groups them into families of hypotheses (e.g., for all outcomes related to health) and adjusts the critical values for tests so as to account for multiple hypotheses; the Bonferroni method is the best known approach for this. In this case, you should specify the family or families of hypotheses and which variables will be considered as part of a given family.
  7. Tests for and procedures to be used for addressing survey attrition. What checks will be done for attrition, and what adjustments will be made if these checks show that there is selective attrition? You should also list the variables to be included in tests of survey attrition.
  8. How will the study deal with outcomes with limited variation? An issue that can arise is a lack of variation in one of the key intervention outcomes to be measured. For example, the data you collect may reveal that everyone in the control group is already doing an activity that the intervention was intended to increase. There is no power to be gained from looking at this outcome, and including it in a family of outcomes can reduce the power to detect an overall impact. So, one can write, for example, “In order to limit noise caused by variables with minimal variation, questions for which 95 percent of observations have the same value within the relevant sample will be omitted from the analysis and will not be included in any indicators or hypothesis tests. In the event that omission decisions result in the exclusion of all constituent variables for an indicator, the indicator will be not be calculated.” Likewise, one might pre-determine that outcomes which have item non-response rates above a certain threshold will be omitted.
  9. If you are going to be testing a model, include the model. In many cases, papers include a model to explain their findings. Often, however, these models are written ex post facto as a way of trying to interpret the results that are found but presented in a way that makes it seem like the point of the paper is to test this model. Setting out a model in advance makes clear the model the authors have in mind before seeing the data – as well as makes sure they collect information on all the key parameters in this model.
 

Data Access

For purposes of the registration, we define an unrestricted data set as one that is generally open for use by researchers and that can be obtained with no or minimal requirements. Minimal requirements would include having to register on a website to download the data (as with Demographic and Health Surveys, for example). A restricted data set is one that is not normally made available to researchers or, if it is, requires a formal application that must be approved.

 

Treatment Assignment Data

Information on who is receiving the intervention is often contained in the same data set as information on outcomes. For example, for a randomized experiment, there will typically be a single primary data set collected for the purposes of the study, with information on outcomes, an indicator of treatment arm, and other variables. In other cases, outcomes measures may be found in household or student data that are matched to separate administrative or other data on the presence of the program or individual participation in the program. In some cases, program information is based on public knowledge of where an intervention is taking place, and there is actually no real treatment ‘data set’ to speak of. In such cases, for purposes of the registration, you should explain that the assignment is public knowledge when asked to describe the treatment data and in the questions that follow indicate that it is already exists, has been obtained, and is unrestricted.

 

Registration Category

In general terms, a registration is prospective if researchers prepare and submit a research design and hypotheses to be tested before the impacts of the program they are evaluating are measured or, if already measured, before the impacts are known to them. Pre-registering evaluation plans is a means for ensuring transparency in reporting and protecting against researcher bias, reporting bias, and publication bias. See here for more information: Benefits of Prospective Registration.

Based on the information provided, RIDIE classifies a registration into one of four categories. The first three distinguish among prospective registrations:

  • Category 1:  Data measuring the impacts have yet to be been collected.
  • Category 2:  The data exist, but these data have not yet been obtained or analyzed by the study researchers.
  • Category 3:  The data have been obtained by the researchers, but analysis has not started.

The final category is for studies that are not prospective:

  • Category 4:  Analysis has already begun.

Note that other than cases where the data on impacts have not yet been collected (category 1, which generally includes RCTs), it is generally not possible to clearly verify that the registration is prospective. In particular, where the data set with impacts information already exists, it is hard to establish that researchers had no access to it or had not examined it before they registered the study. In some cases, access to restricted data can be verified and dated, hence can be shown to have occurred after the registration. However, the philosophy of RIDIE is to accept what researchers say about whether they have access to or have examined the data on impacts at the time of registration, while at the same time distinguishing these cases.

If you believe your study has been incorrectly classified, please explain why under Registration Category Comments. RIDIE staff will consider your comments once you have submitted the registration for review.