Five questions to guide reliable use of real-world data

Assessing whether RWD is truly “fit for purpose”

We have all heard the buzz around real-world evidence (RWE) and how it is often talked about as a cornerstone of the FDA’s effort to modernize under the 21st Century Cures Act. In the same breath, we hear people say that RWE is not as neat and clean as data from randomized clinical trials (RCTs) and why use any less than the gold standard to evaluate the safety and effectiveness of new medical treatments. While recognizing that both RCTs and RWE have important roles in evaluating medical products, it is important to distinguish that RWE is used to understand how products perform in everyday medical care. RWE is sometimes used as an external comparator arm to provide context to single-arm trials. Moreover, RWE has been a mainstay approach for studying product safety after launch, as well as for providing information that can be used for evidence-based clinical or pharmacy decision support. It rarely uses randomization or blind treatment, and it does not use placebos.

Assessments of real-world data (RWD) should start with whether the data of interest are likely to be recorded and accessible, and if so, how much systematic error is likely to be present and whether that can be addressed by design, analysis or another approach.

Two important points to remember are that (1) there is little commonality among RWD sources and (2) they are dynamic. Some RWD is created as part of the “exhaust” of healthcare operations, whereas other RWD are created for a particular purpose, such as patient registries built to enhance clinical understanding about the natural history of disease and how well various treatments perform in typical care settings. A source of RWD is likely to change over time to accommodate business or science needs; therefore, RWD should be evaluated on a case-by-case basis and understood that assessments of the data may become outdated quickly. For example, health insurance claims data may be useful for detecting certain events, provided that the insurance covered care for those events.

It is helpful to start by developing a computable phenotype to describe the clinical phenomenon and the exposure/treatment of interest. Then these five questions should be considered in order to evaluate whether any given RWD are sufficiently fit for a research objective, often described in the shorthand of whether the RWD is “fit for purpose.”

1. PRESENTATION: Would a person experiencing this clinical event or phenomenon be present for assessment?

When pulling data from health system records, researchers must consider the likelihood that the person experiencing the phenomenon would be present for care and in the setting where the records data are available. For example, people who have heart attacks often die outside of the hospital, so there would be no hospital record for many events of interest.

If use of insurance claims data is under consideration, it is important to determine whether an insurance claim would be generated for the event, or whether the medical product or medical consultation would be paid out-of-pocket or by an alternative insurance provider.

2. RECOGNITION/ASSESSMENT: Would the clinical event or phenomenon of interest be accurately recognized or assessed?

If the events or conditions of interest generally come to medical attention, how likely are they to be recognized and assessed? Recognition and assessment may differ according to how the data were collected. For example, if clinical events or phenomenon are reported by patients, how likely would they be to notice the event and be able to report it with enough accuracy for classification? Recognition and assessment may also differ according to the type of healthcare facility and the training or experience of the medical care provider.

3. RECORDING: How might the recording environment or tools affect accurate and consistent recording of the phenomenon of interest?

For data captured by health system records, is the environment likely to affect the accuracy of how a provider records a diagnosis or assessment? Would the way something is recorded likely be different according to the electronic health record characteristics, financial incentives and social influences in a healthcare system?

When data is captured by mobile or consumer devices, researchers should consider how the characteristics of the recording system (a sensor, mobile app, or device vendor) might impact accuracy or consistency of recording the phenomenon or health state of interest. Also, what is the likelihood that the people under study would be using the wearable at the time the event occurs? The goal here is to understand how accurate and precise the information is likely to be and whether that is good enough for a given study objective.

4. HARMONIZATION: Can data from different sources be harmonized – both technically (same format) and semantically (same meaning)?

This question relates to interoperability, including technical barriers to using data collected from multiple health systems, and to potential differences in meaning of seemingly identical data that were recorded in different healthcare environments. Technical barriers may preclude data harmonization or require broader categorization of variables than would be desirable for a given research objective (e.g., physically active? yes/no). Semantic interoperability may be influenced by differences in financial incentives or social influences that may affect how variables are recorded. For example, a sexually transmitted disease might be recorded as an unknown infection due to a perceived social stigma, or a condition may be assessed differently according to medical specialty practice. Data captured by mobile, or consumer devices are likely to have greater measurement variability than similar tests administered in a clinic, raising the question of whether data elements from different sensing devices or patient interfaces can be reliably combined without loss or distortion of meaning.

5. REDUCTION: Can primary data elements be consistently reduced to a useful clinical phenotype?

Here, the question focuses on whether processes for reducing primary data elements to summarize clinical phenotypes would be similarly interpreted across different patient populations, health systems and/or healthcare settings. For example, summarization or reduction can obscure underlying problems with data sources or procedures for harmonization.

It might look good to see data combined across a variety of RWD sources, but that doesn’t mean that the distinctions between clinical phenotypes are still meaningful. Consider the challenges of combining data from an electrocardiogram (ECG) measured on a mobile device with those measured in a clinic by more traditional means. How should those data be summarized and aggregated? Even simpler, consider a study of weight loss. Was pregnancy accounted for and if so, how? Were women with full-term deliveries differentiated from those who experienced a first or second trimester pregnancy loss, etc.?

Putting it all together

As these questions illustrate, assessing RWD for its reliability and utility starts with local understanding of data generation, recording and processing. This often requires some understanding of health system practice patterns, record-keeping processes, and health data systems, as well as device characteristics, user experience and technical aspects of data storage in consumer devices. Assessments should identify probable sources of error and their importance. For any variables that are both important and likely to be error-prone, consider what steps could be taken to address and mitigate those sources.

Last and arguably most importantly, evaluating whether an RWD source is complete or 100% valid is unlikely to be helpful in assessing whether RWD are fit for any purpose. Instead, ask whether specific elements of an RWD source can reasonably and accurately assess a specific phenomenon or health event of interest. and whether the exposure (e.g., medical treatments, vaccines, or diagnostics) has been recorded in an accessible format and location. If these look promising, researchers can then look for ways to reduce the likelihood of systematic error and strengthen the data validity to the extent feasible. This will become the guiding framework for assessing the usefulness of nonrandomized RWE for a wide range of clinical questions.

Assessing whether RWD is good enough for a given purpose should be considered in the context of whether it will advance knowledge, recognizing that science advances through a series of solid steps, and that important contributions can be made, even with imperfect data.

About the author

Nancy Dreyer is Chief Scientific Officer at IQVIA.