If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Corresponding author: Anna Hagemeier, M.Sc. Institute of Medical Statistics and Computational Biology (IMSB), University Hospital of Cologne, Kerpener Str. 62, 50937 Cologne, Germany.
The randomized controlled trial (RCT) is the gold standard in evidence-based medicine. However, this design may not be appropriate in every setting, so other methods or designs such as the regression discontinuity design (RDD) are required.
Method
The aim of this article is to introduce the RDD, summarise methodology in the context of health services research and present a worked example using the statistic software SPSS (Examples for R and Stata in the Appendix A). The mathematical notations of sharp and fuzzy RDD as well as their distinction are presented. Furthermore, examples from the literature and recent studies are highlighted, and both advantages and disadvantages of the design are discussed.
Application
The RDD consists of four essential steps: 1. Determine feasibility; 2. Note possible treatment manipulation, 3. Check for the treatment effect, and 4. Fit the regression models to measure the treatment effect.
Conclusion
The RDD comes as an alternative for studies in health service research where an RCT cannot be conducted, but a threshold-based comparison can be made.
Zusammenfassung
Hintergrund
Die randomisierte kontrollierte Studie (RCT) ist der Goldstandard in der evidenzbasierten Medizin. Dieses Design kann jedoch nicht in jedem Umfeld eingesetzt werden, sodass andere Methoden oder Designs wie das Regressions-Diskontinuitäts-Design (RDD) erforderlich sind.
Methode
Dieser Beitrag führt in das RD-Design ein, fasst die Methodik im Kontext der Versorgungsforschung zusammen und stellt ein praktisches Arbeitsbeispiel unter Verwendung der Statistiksoftware SPSS vor (Beispiele für R und Stata im Anhang). Vorgestellt werden die mathematischen Notationen der scharfen und unscharfen RDD sowie deren Unterscheidung. Darüber hinaus werden Beispiele aus der Literatur und aktuellen Studien hervorgehoben und die Vor- und Nachteile des Designs diskutiert.
Anwendung
Das RDD besteht aus vier wesentlichen Schritten für die Anwendung: 1. Die Durchführbarkeit bestimmen, 2. Mögliche Manipulation der Behandlung beachten, 3. Behandlungseffekt überprüfen und 4. Die Regressionsmodelle anpassen, um den Behandlungseffekt zu messen.
Schlussfolgerung
Das RDD stellt eine Alternative für Studien in der Versorgungsforschung dar, bei denen keine RCT durchgeführt werden kann, aber ein schwellenwertbasierter Vergleich möglich ist.
After many years of exploring, developing and refining various study designs for medical research, the randomized controlled trial (RCT) is still the gold standard. However, even if an RCT is desirable, it cannot be conducted in every setting as the study design needs to address the specific research situation. Coly et al. [
] recommend the choice of design to conduct a study and mention why sometimes, due to limitations such as ethical reasons (e.g. detrimental treatment), financial and feasibility constraints, quasi-experimental designs are a suitable alternative to experimental studies like RCTs.
Just as in experimental designs, quasi-experimental designs examine whether interventions have an objectively measurable effect by testing causal hypotheses, except that group allocation is not random. Instead they offer different allocation mechanisms that can produce a quasi-randomization [
]. In healthcare research, the quasi-experimental regression discontinuity design (RDD) can be well suited for medical and health service research in situations where the allocation of an intervention is regulated by a threshold [
] mentioned it as underutilized and underrated in those fields.
The RDD originates from econometrics and is a threshold-based study design. Means, allocation is regulated by a threshold, i.e. group allocation (treatment or control) depends on whether the value of the assignment variable falls below or above a specific threshold. It was first described by Campbell and Thistlethwaite in 1960 [
] and has recently been rediscovered for the medical and health services research field, although there is little relevant literature in statistics and healthcare research. The most popular example beyond medicine or health research are the results of the SATs (Scholastic Assessment Test as assignment variable). Students above a predefined threshold (cut-off value c) will receive a scholarship for college and success (outcome) with or without the scholarship is measured by their income after finishing college [
This article aims to describe and summarize the methodological basis of the RDD in the context of medical and health service research. Furthermore, strengths and weaknesses are discussed, possibilities for statistical implementation are presented, including a worked example and a prospective view of open issues is given.
The Design
The appropriate design to answer a given research question is not necessarily obvious and definite. However, if an intervention is already established and cannot be assigned purely at random, e.g. for ethical reasons, but a threshold criterion could be included in the study design, RDD is a possible evaluation strategy. This threshold rule defines whether a participant receives the intervention or control treatment, therefore it specifies the treatment allocation. For allocation, any continuous variable, which can be split at a certain point, can be chosen as the assignment variable (). The treatment allocation rule then needs to be verified, usually based on a scatterplot with the assignment variable on the x-axis and the outcome variable on the y-axis. Inclusion of the cut-off value to the plot can show the vertical separation line between the intervention and control group. If the allocation is defined in a strictly binary fashion with assigned probabilities of 0 or 1 for control or intervention, respectively, the clearly defined separation of the groups is known as the sharp design. Figure 1 shows the schematic draft of the sharp RD-Design with fitted regression lines. When a strictly binary assignment is impossible, a fuzzy assignment or fuzzy RDD is given (Figure 2). In a fuzzy RDD, the above-mentioned allocation probability is not sharply disjunct around the cut-off c. This can lead to bias, e.g. unobserved covariates and confounders (see section 'Sharp vs. fuzzy designs'). Hence, the sharp design is always preferable and should be aimed for [
]. The main idea behind this assignment procedure is the comparability just around the threshold. It is obvious that there is an overall difference between participants above and below the threshold. Still, close to the threshold, this difference is very small and one can measure the causal effect by comparing the mean difference between the outcomes within intervals to the left and right of the threshold. Hence, it is similar to randomization in an RCT but only limited to a pre-defined area (bandwidth b) near the threshold value. Getting a precise prediction without bias within this bandwidth is not trivial. If the bandwidth is large, estimation of the treatment effect might be more precise, but the risk that observations within the bandwidth below and above the cut-off value no longer match in their basic characteristics and are therefore no longer comparable, is high. To determine the optimal width for the comparison interval, an iterative approach, e.g. cross-validation, should be used [
]. However, in medicine or health science, the width is often a predefined value, which is added to or subtracted from the cut-off value (). If not pre-defined, the distance should be extended until the samples below and above the cut-off value lose comparability. After finding the optimal bandwidth, regression lines are fitted on both sides of the cut-off value between the bandwidth. One can also continue the regression lines for the entire data but only to visualize a trend in the data. For comparability, one can only use the sample limited to the determined bandwidth (local “as if” randomization idea). Hence, a common problem is the necessity of a large sample size to obtain sufficient subjects within the bandwidth [
Figure 1Schematic draft of the sharp regression discontinuity design with a displayed treatment effect ‘d’ as the discontinuity. (Unfilled points represent the observations in the control group, filled ones the patients in the treatment group.)
Figure 2Fuzzy RDD without sharp treatment allocation. Patients can or cannot receive treatment although their assignment value is above or below the cut-off value. The decision is therefore not strictly based on one assignment-variable and can be changed due to other circumstances. (Unfilled points represent the observations in the control group, filled ones the patients in the treatment group.)
For mathematical estimation of the causal treatment effect within the bandwidth instrument variables or indicator functions () are needed to disjoin the control group from the treatment group. The most commonly used causal effect estimator in connection with the RDD is the local average treatment effect (LATE), also called the complier average causal effect (CACE). LATE/CACE is about estimating the effect of an intervention on subjects actually receiving the treatment. In settings with strict sharp group assignment, there are only “compliers,” i.e., each subject was correctly assigned and receives the correct treatment (full compliance). This corresponds to the described assignment with binary probabilities 0 and 1. In fuzzy RDD the probabilities of receiving treatment or control differ by the threshold. Therefore, researchers must address the problem of incomplete compliance, as some subjects receive or do not receive the treatment as assigned. In this regard, there are similarities in the use of LATE/CACE to the role of the per-protocol (PP) analysis in clinical trials. LATE/CACE may be estimated by performing an intention-to-treat (ITT) analysis and then dividing the obtained result by the difference in proportion (probability) of participants who have been treated but were assigned to the control group or were assigned to the treatment group but did not receive treatment. Usually this is done by a two-stage least squares method [
]. Because this paper aims to declare the simplicity of the RDD, present it as a good alternative to RCTs and show its implementation in the medical field, the reader can receive more detailed information on LATE/CACE and their calculation within the recommended articles [
According to the literature, there are many functional possibilities to perform causal inference with the RDD. The most common and simplest one is linear regression [
Yi : continuous outcome with i = 1, 2, …, n observations, β0 : intercept, β1 : slope, : covariate of (continuous assignment variable) with i = 1, 2, …, n observations, β2: measures the treatment effect (“jump” at the cut-off value, equals d in Figure 1 and means the discontinuity as the difference between the two regression functions), : indicator function or treatment indicator (sharp design) or 1 if x ≥ c (treatment), c : cut-off value, : random error
If equation (3) would be extended to allow for a change in slope at the threshold, the regression model would become:
x*: x - c; distance to the cut-off, β3 : measures the treatment effect (“jump” at the cut-off value, equals d in Figure 1 and means the discontinuity as the difference between the two regression functions), β1 :slope below the cut-off, β1+β2 : slope above the cut-off
The expansion to a polynomial is possible, but this will not be considered further here.
The RDD in Figure 1 is an example of an effect based on the treatment (no interaction, regression lines run parallel). The treatment effect can be identified around the threshold by e.g. simply calculating the difference of the intercepts of both regression lines on each side of the cut-off or graphically compare the regression lines and scatterplots. To clarify the simple idea behind the RDD, the researcher can restructure the study data at the cut-off value into two subsets, conduct linear regressions on both subsets and compare the intercepts of both regression functions. The difference between the intercepts indicates the discontinuity and thus the treatment effect, if lines run parallel. If the slopes of the regression lines differ (interaction effect) the researcher can also simply calculate the differences of intercepts but needs to shift the assignment variable by the cut-off value to zero (x-c). Another possibility would be to conduct a simple t-test on the groups within the bandwidth b around the cut-off value [c-b; c+b], but the treatment effect then might be underestimated [
Since the assignment variable can be any continuous measure, many possible uses of the RDD in medicine or health service research are imaginable. For example, Maciejewski & Basu [
] mentioned the use of the RDD in conjunction with the neonatal intensive care unit (NICU). New-borns underweight (<1500g) more often need intensive care then heavier infants. Thus, one can compare the difference and effect of intensive care at a cut-off value where infants just around the cut-off should be almost identical and weight would be the assignment variable. Another well-known assignment variable for implementing the RDD in the medical and health service research field with application to psychology is the hospital anxiety depression scale (HADS). The HADS consists of 14 questions, 7 of which relate to depression and 7 to anxiety, and has a total score ranging from 0 to 42 [
]. A recent example using the RDD and the HADS as the instrument for group allocation (assignment variable) is the psycho-oncological study isPO – (integrated cross-sectoral psycho-oncology), where the threshold is set between 14 and 15 score points. Every patient receives either standard psychological care (below threshold) or individualized/special treatment after cancer diagnosis. After 12 months, evidence of efficacy will be provided by re-interviewing patients with the HADS (outcome) [
As previously described, to get close to a randomized experiment, a sharp RD-design is the desired approach. However, individual and complex scenarios or incorrect treatment allocation can lead to fuzziness regarding the groups (Figure 2).
As a result, some patients receive treatment even though they are below the cut-off according to the allocation variable and some patients do not receive treatment while exceeding the threshold. Bor et al. [
] describe such a violation of treatment allocation using the example of HIV infections where there are two possibilities to receive treatment. Either the CD4 count drops below a predefined threshold or the symptoms indicate the severity of their disease. Another very simple example comes from Maciejewski & Basu [
]: hemoglobin values at a certain threshold are used to decide whether a patient needs a blood transfusion or not. But sometimes, although the hemoglobin value is in normal range, the patient needs a transfusion because of other clinical factors. Of course, the decision can also be made in the other direction. Thus, the allocation is based partly on a threshold rule and partly on clinical judgment, allowing switching from treatment to no-treatment and vice-versa [
]. As a result, an RDD with imprecise threshold allocation is strictly not a real RDD (fuzzy RDD). Linear regression approaches or non-parametric methods can be used for both the sharp and fuzzy RDD versions. Overall, however, one should always seek for a solution that leads to a sharp RDD, e.g. try to control for observed covariates [
Considering the sharp RD-designs with equal assumptions about the patients' characteristics at both sides of the cut-off, there are four theoretical outcomes for treatment effect (Figure 3). First option is panel A where no treatment effect is apparent. This would also be the case when no treatment was performed. The regression lines would be congruent. In panel B an interaction effect is apparent. This is shown by a change in the slope of the regression line. This means that the effect of treatment increases or decreases with increasing values of the assignment variable above the cut-off value in the treatment group but is still equal (no visible jump) to that of the untreated group at the cut-off. Panel C shows the essential RDD idea with only a main treatment effect with a discontinuity (jump) of the regression lines at the cut-off value. Panel D is a combination of panel B and C where an interaction effect (treatment effect only for those with a higher assignment value, change in slope) and a main effect (treatment effect along the entire assignment scale, discontinuity at the cut-off, change in intercept) occur [
Figure 3Schematic outcome possibilities in RDDs with X as the assignment variable, C as the Cut-off with bandwidth and Y as the outcome variable; A) No treatment effect, B) no main effect, but interaction effect, C) main effect for treatment and D) main and interaction effect.
Strengths and weaknesses of the RDD in theory and practice
When used appropriatly, RDD has several strengths compared to other study designs. The main advantage of RDD is the (local) quasi-randomisation around a specific threshold, which avoids ethical problems in treatment allocation, i.e. treatment is still allocated based on individual needs. However, this requires a specific allocation criterion (threshold). In practice, administrative or secondary health insurance data may be used to evaluate clinical interventions, making the main disadvantage of RDD easier to address. This disadvantage is that the RDD usually requires a large amount of data to precisely estimate the treatment effect near the threshold. However, by using retrospective, already or routinely collected data, a proof of effectiveness in real-life settings may still be possible. In prospective studies the large amount of data required is a major problem, mainly because of the high financial costs entailed by data collection. Note that manipulation with the threshold, e.g. by intentionally giving poorer answers to a questionnaire to achieve a higher score and thus be more likely to get into the treatment group, can lead to a bias in local randomisation, both in retrospective and prospective designs. [
Statistical power for the comparative regression discontinuity design with a pretest no-treatment control function: Theory and evidence from the National Head Start Impact Study.
Outcome (Y) observable for both groups (treatment/no treatment)
c.
Clear assignment rule to determine if design is sharp or fuzzy
2.
Watch for opportunities for allocation manipulation to control the treatment status. One opportunity to detect manipulation would be a histogram of the assignment variable (X). If the distribution is significantly skewed, that would suggest a way to manipulate the allocation and thus the treatment status. To guarantee the comparability of the groups, covariate balance tests can be performed for the allocation variable X.
3.
Check for treatment effect while plotting the result (Outcome Y) against the assignment variable (X). A visible jump near the cut-off value confirms discontinuity and indicates a treatment effect.
4.
Depending on the data availability and other factors, fit the regression models to measure the treatment effect
a.
around the cut off with a local linear regression within the bandwidth and/or
b.
over all data.
Implementation with a worked example
Statistical programs like R, Stata or SPSS already offer sound implementations for analysis with the RDD [
]. In the following we show a more detailed working example for SPSS. For R and Stata, you will find examples with corresponding program code on the same data basis in the Appendix A. Since the HADS has already been mentioned as an established instrument in the medical context and as an example assignment and outcome variable for the RDD, the worked examples are performed with simulated HADS data in a fictitious study setting. In this setting patients will receive standard psychological care underneath a pre-defined threshold and intensive care above the threshold. The following assumptions are made for the example:
•
HADS T1 will be assignment variable and HADS T2 the outcome variable
•
Patients in the intensive care group improve their HADS (T2) (decreases from at least 2 up to 7 points)
•
Patients in the standard care group improve their HADS (T2) (decreases 0 up to 1 points)
•
For simplification, the HADS can also assume non-integer values between 0 and 42 and underlies a uniform distribution.
•
The pre-defined cut-off value of c = 14.5 and a bandwidth b = [
The program code for generating the data can be found in the Appendix A. The scatter plot of the data from n = 500 subjects and the threshold line at c = 14.5 is shown in Figure 4.
Figure 4SPSS-Scatterplot of the fictitious/simulated HADS data.
SPSS offers two options to implement the RDD analysis.
Strictly speaking, the first option is not an option with SPSS alone, but rather a combination with R. SPSS Version 18 or higher in combination with the 'Essentials for R' and the extension bundle 'STATS_RDD' offers a simple and effective possibility to analyse a regression discontinuity (Use 'STATS RDD /HELP.' for more information in SPSS). The R package used for the extension is 'rdd' implemented by Drew Dimmery [
]. For analysis of the given sample data set (Program code 1) follow the path 'Analyze>Regression>Regression Discontinuity' in SPSS or use the syntax (Program code 2) to reproduce the following results. Important, for the comparison with the simple linear regression, the option “Rectangular” must be selected in the package STATS_RDD for the option Kernel.
Table 1 shows the main result table of the performed STATS RDD with SPSS. One can read the significant local average treatment effect (LATE = −4,704, p = 0,001) within the pre-defined bandwidth (b = 14.5±1.5) as well as the number of cases (n = 35). This would mean in the fictitious study setting that patients with a T1-HADS (assignment variable) value > 14.5 improve their anxiety and depression scores significantly.
Table 1Results of regression discontinuity design in SPSS using ‘R Essentials STATS_RDD’.
Second option is the before mentioned approach via simple linear regression (see section ‘The Design’) in SPSS with specified interaction terms. Use Program code 3 to reproduce the results. The results via simple regression will differ slightly from the results via STATS_RDD function because one needs to pre-define the bandwidth for the simple regression analysis via filtering the data and there might be rounding differences within the different approaches. Because the linear regression does not show the number of cases one should simply calculate the frequency of the grouping variable within the filtered data (HADS<=14.5: n = 23 (65.7%); HADS>14.5: n = 12 (34.3%); Total: n = 35).
Table 2 contains the results from linear regression where one can see the same significant average treatment effect as in the above STATS_RDD results. The coefficient for the model parameter HADS_group (Coefficient = −4,704, p = −0.004) is simply the difference of the intercept coefficients of the regression lines at the cut-off and thus the local average treatment effect.
Table 2Results of linear regression with interaction terms to perform an RDD. (for explanation of model parameters see comments within Program code 1).
The RDD offers a possibility to compare different interventions and measure treatment effects in a locally randomized sense. Thus, the risk of confounding may be low in contrast to other quasi-experimental study designs. Nevertheless, the design still appears underexplored and rarely used in health sciences. This is probably due to the widespread focus on randomized evidence and the absence of specific thresholds to base the treatment decision on [
]. Which also confirms and highlights the need for further research in RDD. For example, the RDD methodology might be extended to the analysis of outcomes measured repeatedly over time and on the same subjects.
Acknowledgement
The preparation of this manuscript was partially funded by the Innovation Fond of the Joint Federal Committee in the form of the project “isPO—Integrated, cross-sectoral psycho-oncology” (grant number: 01NVF17022-IsPo). A big thank you also to Anne Adams, Stefanie Hamacher and Jeremy Franklin for their comments on the manuscript which improved it considerably.
Conflict of interest
The authors declare no conflicts of interest.
CRediT author statement
Anna Hagemeier: Conceptualization, Writing- Original draft preparation, Visualization, Writing – review & editing, Methodology. Christina Samel: Conceptualization, Writing – review & editing, Methodology. Martin Hellmich: Supervision, Conceptualization, Writing – review & editing, Methodology.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
Statistical power for the comparative regression discontinuity design with a pretest no-treatment control function: Theory and evidence from the National Head Start Impact Study.