Research Article| Volume 112, SUPPLEMENT 1, S23-S26, 2016

Download started.


How real-world data compensate for scarce evidence in HTA

  • Elisabeth George
    Corresponding author: Dr Elisabeth George, National Institute for Health and Care Excellence, 10 Spring Gardens, London SW1A 2BU, United Kingdom.
    National Institute for Health and Care Excellence, London, United Kingdom
    Search for articles by this author


      Most guidance developed by NICE is based on a value assessment using clearly articulated and published clinical and cost effectiveness criteria. In order to enable consistency and fairness across all decisions, NICE uses as a unit of health benefit the quality-adjusted life year (QALY). Both QALYs and costs for a technology are estimated by long-term disease modelling. This requires a variety of clinical input parameters, and often extrapolation beyond the trial period, and of intermediate or surrogate to final outcomes. RCT data will remain the main data source for the majority of appraisals, but because the data necessary for disease modelling is often not available from RCTs, particularly for the UK context, the use of non-RCT data is the norm in NICE technology appraisals. This does not only apply to data on resource use, service provision and HRQL data, but also to efficacy data. In some situations non-RCT data are more relevant to a decision context than the RCT data, and in some situations, as illustrated by 3 examples, it would be unreasonable, not to take account of existing non-RCT data.
      The use of non-RCT clinical evidence is most common for devices, interventions where RCTs are difficult, and in conditions with poor prognosis where single arm studies are often carried out. Therefore, a pragmatic approach to the available evidence is needed for many decision made by the NICE Appraisal Committees to come to a reasonable and defendable decision.


      Die meisten der vom National Institute for Health and Care Excellence (NICE) entwickelten Leitlinien stützen sich auf Nutzenbewertungen, für die eindeutig definierte und publizierte klinische sowie Kosteneffektivitätskriterien herangezogen wurden. Um eine entscheidungsübergreifende Einheitlichkeit und Fairness zu gewährleisten, verwendet das NICE als Messgröße für die Bewertung des gesundheitlichen Nutzens das qualitätskorrigierte Lebensjahr (quality-adjusted life year, QALY). Sowohl die QALYs als auch die Kosten einer medizinischen Technologie werden mithilfe von Langzeitkrankheitsmodellen geschätzt. Als Inputdaten werden die verschiedensten klinischen Parameter benötigt, und oftmals ist auch die Extrapolation über den Studienzeitraum hinaus sowie die Extrapolation von Zwischenergebnissen bzw. Surrogatparametern auf die endgültigen Ergebnisse erforderlich. Für die Mehrzahl der Bewertungen werden Daten aus randomisierten kontrollierten Studien (RCTs) zwar auch weiterhin die wichtigste Datenquelle bleiben, doch weil die für die Modellbildung benötigten Daten aus RCTs häufig nicht zur Verfügung stehen (vor allem in Großbritannien nicht), stellt die Verwendung von Daten aus nichtrandomisierten Studien für die Technologiebewertungen des NICE die Norm dar. Dies gilt nicht nur für Daten zur Ressourcenverwendung, zu Versorgungsleistungen und für HRQL-Daten, sondern auch für Wirksamkeitsdaten. In manchen Fällen sind Daten aus nichtrandomisierten Studien für die Entscheidungsfindung von höherer Relevanz als solche aus RCTs, und in bestimmten Situationen wäre es, wie an drei Beispielen gezeigt werden soll, sogar unvernünftig, vorhandene Daten nicht zu berücksichtigen, nur weil sie nicht aus RCTs stammen.
      Aus nichtrandomisierten Studien stammende klinische Evidenz wird am häufigsten für Medizinprodukte und Interventionen herangezogen, bei denen die Durchführung von RCTs schwierig ist, sowie bei Krankheitsbildern mit einer ungünstigen Prognose, bei denen oftmals nur einarmige Studien durchgeführt werden. Aus diesem Grund bedarf es bei vielen der von den NICE-Bewertungsgremien zu treffenden Entscheidungen eines pragmatischen Umgangs mit der verfügbaren Evidenz, um zu einer vernünftigen und belastbaren Entscheidung zu gelangen.



      This article sets out to explain in what way the National Institute for Health and Care Excellence (NICE) considers evidence from sources other than randomised controlled trials (RCTs). NICE is an English Non-Departmental Public Body which provides national guidance to the NHS in England on the promotion of good health and the prevention and treatment of ill-health in line with the best available evidence of clinical-effectiveness and cost-effectiveness. Since 2000, NICE has published guidance on health technologies (drugs, medical devices and diagnostics, interventional procedures), clinical practice (clinical guidelines), public health interventions (since 2005), and since 2013 is also developing social care guidance. In addition to publishing guidance, NICE also develops quality standards and performance metrics for those providing and commissioning health, public health and social care services, and provides a range of informational services for commissioners, practitioners and managers across the spectrum of health and social care. The guidance produced by one of the NICE programmes, Technology Appraisals, carries a funding direction, which means that any technology recommended in a published Technology Appraisal has to be funded by the NHS within 3 months of guidance publication. As such, NICE recommendations are the primary source of guidance for new medicines and new licence indications for existing medicines within the NHS. NICE also provides implementation and adoption support for its other guidance programmes.
      Guidance published by NICE, with the exception of interventional procedure guidance, is based on a value assessment using clearly articulated and published clinical and cost effectiveness criteria for decision making. For NICE, this value assessment applies the perspective of the whole NHS system, and is defined through an assumed opportunity cost, which is the health benefit displaced elsewhere in the NHS if a new technology is adopted, or in other words, what the NHS pays on average to generate a unit health benefit. In order to enable consistency and fairness across all decisions for all therapeutic areas, NICE uses as a unit of health benefit the quality-adjusted life year (QALY). QALYs are estimated by multiplying length of life with an index measuring quality of life. Cost effectiveness of a new technology is therefore expressed as cost per QALY gained compared with standard care. The opportunity cost is clearly articulated in the Guide to the Methods of Technology Appraisal [] as the maximum acceptable cost per QALY gained (£20,000-30,000 per QALY gained).
      The health gain provided by a technology is estimated by, mainly long-term, disease modelling. Then resources use and costs are added to create cost effectiveness modelling. The aim is to answer the simple question of how well the new technology works in relation to how much it costs, compared with standard practice in the NHS.
      Therefore, cost effectiveness modelling requires a variety of clinical input parameters, such as clinical effect sizes, adverse events and complications, baseline clinical data (epidemiology/ natural history of disease), health related quality of life (HRQL) data, and compliance/ adherence data. Because in most cases long term modelling is necessary, there is also a need for extrapolation beyond trial period, and also extrapolation of intermediate/ surrogate to final outcomes, and of trial results to relevant settings, for example by incorporating country-specific data to provide meaningful modelling of the NHS context.
      As far as HRQL data are concerned, NICE prefers the use of the EQ-5D as the measure of HRQL in adults, and specifies in its Methods guide [] that changes in HRQL should be described directly by patients, but that the value of changes in patients’ HRQL should be based on preferences expressed by the public. Importantly, this means that the utility data resulting from the EQ5D reflect what the public would trade off to avoid a certain health state. Data on HRQL is often collected in clinical trials, but not often used in modelling approaches submitted to NICE. Instead many models use HRQL data from sources other than RCTs.
      Resource use and cost parameters required for modelling should reflect the health system's service delivery patterns and routine setting, and it is therefore not always appropriate to include resource use from RCTs because the latter are protocol-driven (e.g. regular CT scans or clinical appointments) and not reflective of routine clinical practice. Also, because it is important for NICE to use UK specific costs, sources for resource data are often from NHS-based observational studies, administrative data, chart reviews, listing published by the Department of Health, national data based on healthcare resource groups, such as the Payment by Results tariff, the British National Formulary, and sometimes even from expert opinion.
      Because the data necessary for disease and cost effectiveness modelling is often not available from RCTs, the use of non-RCT data is the norm in NICE technology appraisals. This does not only apply to data on resource use, service provision and HRQL data, but also to efficacy data. Sources of non-RCT clinical parameters often come from extensions of RCTs, pragmatic clinical trials, observational data from cohort studies, early clinical trials or phase IV trials, registers, patient or population surveys, adverse effect reporting, or from expert judgements.
      The use of non-RCT efficacy data or other clinical evidence is most common for devices (eg insulin pumps, cochlear implants, endovascular stents), interventions where RCTs are difficult (Anti-D treatments or venom prophylaxis), and in conditions with poor prognosis where single arm studies are often carried out (sarcomas, GIST, resistant leukaemias). Therefore, a pragmatic approach to the available evidence is needed for many decision made by the NICE Appraisal Committees to come to a reasonable and defendable decision. Some pragmatic approaches are shown in the following examples.
      The first example illustrates the need for use of a national register. In the appraisal of total hip replacement and resurfacing arthroplasty for end-stage arthritis of the hip (TA304) [], the analyses were based on the generic outcomes of mortality, HRQL and adverse effects of treatment; the condition-specific outcomes included in the analysis were functional result, pain, bone conservation, prosthesis movement, dislocation rates and revision rates, the latter describing the rate at which a second or subsequent hip replacement is required.
      As hip replacements are expensive, it is not surprising that the revision rate was the key driver of costs and QALYs, and particular types of prosthesis become more cost effective the lower the revision rates. However, revision rates are not available from RCTs as these are normally too short to collect revision rates. Therefore, as a source for the key model parameter, data from the UK National Joint Registry were used.
      The UK National Joint Registry set up by the Department of Health and Welsh Assembly Government for the mandatory collection of information on all hip, knee, ankle, elbow and shoulder replacement operations from NHS organisations and private practice, and to monitor the performance of joint replacement prostheses. Since 2009, all NHS patients who are having hip replacement surgery are invited to also fill in Patient Reported Outcome Measures questionnaires about their health and quality of life before and after their surgery.
      The second example illustrates the use of health care data from another country's health care provider and is about the appraisal of percutaneous vertebroplasty and percutaneous balloon kyphoplasty for treating osteoporotic vertebral compression fractures (TA 279) []. The analyses were based on generic outcomes (mortality, HRQL, adverse effects) and the condition-specific outcomes of functional status, pain, and vertebral height. The 9 RCTs available were of limited value because of small sample sizes, relatively short follow up (up to 36 months), diverse comparators in the control arms, cross-over being permitted in the majority of the trials, and only 3 were powered for the primary outcome. The modelling showed that, regardless of many other parameter uncertainties, the technologies could only be considered cost effective when it was assumed that they were reducing mortality. The meta-analysed trial data did not show a statistically significant benefit on mortality, likely because of the short follow up and small sample sizes. However, data from a US Medicare Register with data from over 800000 patients was available, and clearly showed a statistically significant reduction in mortality with the technologies. This was supported by data from 3600 patients from a German Health Insurance Fund study. The Appraisal Committee considered it appropriate to take these observational data into account, firstly because there was corroboration in 2 independent data sets, and secondly because there was also clinical plausibility that improving spine curvature has an effect on mortality through improved lung function, digestion, mobility, and less opioid analgesics. Therefore the Appraisal Committee thought it was reasonable to assume a mortality benefit and recommended the 2 technologies for use in the NHS.
      The third example illustrates how even in the absence of necessary data reasonable decisions can be taken. In the appraisal of Pharmalgen for the treatment of bee and wasp venom allergy for people with a history of type 1 IgE-mediated systemic allergic reactions to bee or wasp venom (TA 246) [], the new technology was compared with high-dose antihistamines, adrenaline auto-injector plus training, and advice on avoiding bee or wasp stings. The analyses were based on generic outcomes (mortality, HRQL, adverse effects) and the condition-specific outcomes included in the analysis were number and severity of type 1 IgE-mediated, systemic allergic reactions and anxiety related to future reactions to stings.
      No evidence on HRQL using a validated utility measure was available, therefore the base case analysis assumed no change in HRQL associated with anxiety. In these analyses Pharmalgen was cost saving in people at high risk of stings, but had a very high cost per QALY gained (£18m/QALY) in the rest of the population included in the appraisal. However, patient organisations and patient representatives explained very clearly and convincingly the impact of the anxiety of being stung on their daily lives. Therefore, the assessment team modelled the patient testimony using the EQ5D framework. The EQ5D questionnaire consists of 5 domains: mobility, self-care, usual activities, pain/ discomfort and anxiety/ depression. Each domain can be scored at 3 levels: no problems, some problems, extreme problems. The Assessment team assigned the lowest level (no problems) to the domains of mobility, self-care and pain/ discomfort, and the mid-level (some problems) to the domains of usual activities and anxiety/ depression. They then used the standard UK tariff to value the resulting EQ5D health state and this lead to a utility decrease of 0.16 for anxiety about future stings; this means that people with such anxiety would give up 16% of their life time to avoid the effects of being stung. The Assessment team then used conservative assumptions to allow for the fact that not all people affected will experience anxiety, by only using 25% of the utility decrease of 0.16 (that is, they modelled a reduction in utility of 0.04 associated with venom allergy), and by assuming that treatment with Pharmalgen increases utility by 25% of that value (that is, 0.01 per person per year). Using these conservative assumptions, Pharmalgen resulted in a cost per QALY gained within the acceptable range, and the Appraisal Committee considered it reasonable to recommend the technology not only for people at high risk of stings but also for people with anxiety about future stings.
      The use of non-RCT data in decision making needs to be considered on a case-by-case basis. RCT data will remain the main data source for the majority of appraisals. However, in some situations the non-RCT data are more relevant to a decision context than the RCT data, and in some situations, as outlined above, it would be unreasonable, not to take account of existing non-RCT data. One must, of course, always be aware of the many problems associated with non-RCT data (such as confounders and bias, uncertain accuracy, missing data, aggregation at level of provider unit or disease - not patient-level data, definitions vary from trials). However, there is scope for improvements in non-RCT data, for example improvements to the ability of identify cases and link data sources, to routinely capture HRQL and patient preferences, to routine inclusion of important patient variables (socio-demographic, disease severity and comorbidities); and the introduction of explicit definitions of variables including coding system and to limit missing data.
      In order to facilitate the development of such improvements, NICE is taking part in 2 cross national projects. The first one is the IMI Get Real project, an EU public-private consortium consisting of pharmaceutical companies, SMEs, academia, HTA agencies and regulators, and patient organisations []. A key deliverable is a framework to provide guidance on the available options for use of real world evidence to support effectiveness estimates. The project is also working on understanding what drives any differences between efficacy and effectiveness, to show more clearly when real world data is needed; improve design of real world studies, overcoming operational, legal and ethical challenges to the design of real world studies, and scientific methods for evidence synthesis and predictive modelling, including real world evidence in network meta-analyses. In addition the project will review effectiveness challenges experienced in past regulatory assessments and HTA, propose potential solutions using real world evidence, and get stakeholder reactions on the acceptability and usefulness of these solutions for decision making.
      The second project is the ADAPT SMART project [, accessed 20 January 2016

      ], which is coordinating the Medicines Adaptive Pathways to Patients (MAPPs) activities. MAPPs seeks to foster access to beneficial treatments for the right patient groups at the earliest appropriate time in the product life-span in a sustainable fashion. ADAPT SMART will support IMI2 projects investigating MAPPs tools and methodologies, and engage in a dialogue with all relevant stakeholders to prove and develop workable concepts, to facilitate the development of one strategy for product development and access for patients.
      As a consequence of these regulatory developments, the future market access landscape with see much earlier licensing of drugs with less evidence. It would not be fair or reasonable to refuse access for patients to these new drugs in clinical practice when they have been deemed safe and efficacious by the regulator. New ways of handling uncertainty through managed access agreements need to be developed that work well leading to meaningful real world data collection on the back of routine care.
      There has been widespread concern about ‘lowering evidence standards’ by using observational studies. The reality however is that RCTs have never been, and will never be, able to answer all needs of HTA and decision makers, and that new approaches need to be developed for the future that allows the use of the right study tool for different questions. This will create the opportunity to strengthen the evidence standards and evidence generation through the life cycle of technologies, in which RCTs and real world evidence complement each other.


      1., accessed 20 January 2016

      2., accessed 20 January 2016

      3., accessed 20 January 2016

      4., accessed 20 January 2016

      5., accessed 20 January 2016

      6., accessed 20 January 2016