Pharma & Big Data: New Heights Beckon

Pharmaceutical Commerce, Pharmaceutical Commerce - November 2021, Volume 16, Issue 4

Ongoing strides in data mining, data analytics, modeling and simulation are opening up new and more definitive opportunities to apply data-driven insights to improve everything from upstream drug discovery and clinical trials, to the downstream go-to-market strategy and post-patent lifecycle management

The ongoing evolution of computing and cloud capabilities, modeling and simulation methodologies and data analytics techniques continues to shape how the pharma/life sciences industry carries out all of its essential activities—from upstream drug development and regulatory approval to downstream commercialization and lifecycle-management efforts beyond patent expiry.

“By augmenting proprietary data with external sources of both real-world data (from payer claims databases, electronic health records [EHRs] systems and more) and aggregate data available in published clinical trial results, companies can dramatically increase their confidence in the most critical drug-development decisions and drive greater efficiency,” says Matt Zierhut, PhD, vice president, integrated drug development for Certara, a biosimulation software and technology company.

“Whether it be a subpopulation where there is believed to be a particular advantage for the treatment, or a related setting outside of the original target population where benefits have been observed, strategically tapping into available external sources may help drug sponsors to validate clinical beliefs that were initially based on anecdotal evidence before making big investments in a new direction,” adds Roman Casciano, senior vice president and head of Certara’s evidence and access consulting business.

The goal of data-analytics efforts is to not only improve the operational efficiency of all phases of the clinical trial, “but to increase the probability of high-validity clinical findings that can support product-development decision-making and define all of the most promising therapeutic indications and the right patient populations to target, and then optimize the protocols to improve the odds of success for that particular trial,” says Javier Jimenez, MD, PhD, executive VP, real-world evidence and late phase, Syneos Health, a contract research organization (CRO).

Another important use for advanced data analytics is to help ensure that clinical trials produce higher-validity, regulatory quality data, by sorting and classifying data-entry errors, outliers, inconsistencies and misreported adverse events that could undermine clinical development findings, notes Niranjan Kulkarni, PhD, senior director, consulting services for CRB, an integrated solutions provider to advanced technology industries, specializing in the life sciences markets.

A key challenge for high-cost prescription therapies has always been how to translate what happened in the trial to the actual outcomes that occur once the therapy becomes available, in order to justify its price. “Today, stakeholders no longer need to just model various scenarios hypothetically—they can actually track the product’s clinical results in real-world use and conduct studies to validate more widespread clinical outcomes over time,” says Jimenez.

Such studies can demonstrate how the product performs on its own, how it performs in comparison with competing products in the same therapeutic space and how it performs in patients who are decidedly different from those enrolled in the underlying trial.

Similarly, by identifying patterns over time in actual prescribing and related healthcare data (particularly with regard to off-label use among real-world patients), biopharma manufacturers can sometimes identify promising—previously unknown—uses for the medication. When such additional indications can gain regulatory approval, the effort helps to provide additional clinical benefit for patients and ongoing revenue to the company. In September, FDA released draft guidance1 that describes how real-world evidence (RWE) can support label-expansion discussions.

Today, there is growing collaboration among the traditional epidemiology community, data scientists, drugmakers and the regulatory community, who are working to develop methodologies that reduce bias in findings derived from data-analytics effort, says Jimenez, who adds: “The goal is to be able to design and carry out targeted studies involving real-world data (RWD) that are able to replicate the findings in a clinical trial, and to be assured that the data-driven clinical findings are real—not biased perhaps because of how the population was selected or due to other confounding factors.”

Improving clinical trial design and recruitment

The randomized clinical trial (RCT) process—as valuable as it is—remains inherently flawed, mainly due to the limited overall scope of any trial, in terms of the number and type of patients involved, trial duration and the clinical endpoints studied. “RWD studies can be very helpful in contextualizing results from clinical trials, especially where the trial population does not adequately reflect the diversity of real-world patients taking the medication,” says Aniketh Talwai, delivery lead for Medidata Acorn AI. Recent work by that company with clients in the cardiovascular space to determine the real-world, age-stratified risk levels for some common agents, “found that they can vary quite a bit from the results published for the trial population,” he adds.

“Today, RCTs, coupled with advances in computing power, accessibility and recording of additional data modalities via medical imaging and EHRs that are available for trial participants, and innovations in statistics, machine learning (ML), and big data analytics can help generate a completely new environment for drug exploration and validation,” adds Yvonne Duckworth, PE, associate, senior lead automation engineer, Pharma 4.0 SME for CRB. Such efforts can also be used to quantify and prioritize unanswered clinical questions in the absence of published evidence, she says.

However, not all types of RWE have equal evidentiary value. So-called research-grade RWE—which refers to high-accuracy evidence that is developed when complete medical records are linked with outcomes using advanced technologies—is required “to help clarify endpoints that would be of interest,” says Anand Shroff, co-founder and president of Verantos, an advanced RWE company based in Menlo Park, Calif.

Such efforts “help drug sponsors to improve trial design and pressure test common flaw areas,” adds Lucas Glass, VP and global head for IQVIA’s Analytics Center of Excellence. He notes that such efforts support the ability to:

  • Estimate the target study population and evaluate which inclusion and exclusion criteria are best.
  • Compare potential study endpoints to recent similar trials to highlight similarities and differences that may help the drug sponsor to identify a competitive advantage to pursue or to better align efforts with emerging trends.
  • Identify extraneous and missing procedures (extraneous procedures can be costly and impose unnecessary burden on sites and patients, while missing procedures can result in failure to achieve statistical significance on a primary or secondary endpoint).
  • Examine patients’ familiarity with significant or invasive procedures and identify sources of patient burden that may create barriers to recruitment and retention.
  • Conduct sub-population analysis to better understand and predict how different patient cohorts will respond to new therapies.
  • Analyze previously published RCT data to predict the probability of technical and regulatory success.
  • Model disease progression

Using an appropriate data-analytics strategy, drug sponsors can also “merge databases from different sources to filter out patients that do not meet basic requirements for the clinical trial,” says Duckworth. “Patterns in different studies can be analyzed using artificial intelligence (AI) techniques to evaluate risk factors and effectiveness of the intervention, help to create a general prediction model for the patient and assess the cost effectiveness of the treatment.”

According to Talwai, Medidata Acorn AI is interrogating its compiled immuno-oncological trial records to identify risk factors for key treatment-related adverse events (TRAEs). “The findings can then be translated into a trial protocol that is stratified by baseline patient risk,” he adds.

One of the perennial challenges in harnessing RWD is fragmentation across payer systems, EHR systems’ diagnostic lab-testing records and other desirable sources of data. “One big advance in practice that looks poised to help finally overcome this barrier is the advent and real-world implementation of privacy preserving tokenization approaches, which can help to bridge patient data across silos in a compliant way,” says Talwai. “We’ve seen this successfully implemented at scale recently through one pro-bono, public-private Covid-19 consortium involving nearly a dozen companies2 (Medidata is a founding member).”

This effort brought together Covid-related records for millions of patients across claims, EHR and consumer data sources over a multi-year period, “which would have been immensely challenging to execute even a short while ago,” contends Talwai.

Similarly, the design of the clinical trials carried out in pursuit of the Covid-19 vaccines “was driven directly by patient-level data and public health data that was studied to target areas with high early infection rates,” explains Jimenez of Syneos Health. “Efforts to recruit patients more effectively really showed how making the best use of available data guided the decision-making and helped sponsors to evaluate the benefit of the vaccine in the trial. This helped to accelerate those trials much more than anyone anticipated.”

Using RWD signals to target adherence

Poor adherence to therapy within a clinical trial can undermine the caliber of the study findings and reduce actual clinical outcomes in real-world use. “In a typical trial that spans many sites, sponsors are sort of rolling the dice when it comes to ensuring that participants are actually taking their medications,” says Rich Christie, MD, PhD, chief medical officer at AiCure. The company is using AI and ML tools, along with digital biometric markers (discussed ahead), to confirm that patients enrolled in a trial are staying on therapy. The resulting insights allow for faster human intervention from trial operators to keep the study on track. And Christie notes that now that AiCure has tracked more than a million doses for its clients, “we have the scale of data that allows us to go from just knowing who’s taking their medications to predicting— with accuracy—which types of patients are likely to take them or not. This enables better real-time interventions.”

When matching potential patients with specific trials, the use of digital biomarkers that are captured “in the wide part of the funnel” during screening efforts can help to narrow down the field to enroll the best participants. Examples include using audio and video tools to assess hand tremors in patients with Parkinson’s disease, or to assess cognitive changes based on voice and language in Alzheimer’s disease patients. “I think there will be a lot of real impact in this area in the next few years,” says Christie.

Similarly, advanced data analytics are being used to improve trial site identification by studying the prevalence/incidence of the target patient population in different countries, notes Glass of IQVIA. He says that in one recent client engagement, for example, the trial had very challenging inclusion/exclusion criteria whereby patients had to have been on a new therapy for 12 months prior to being eligible for the study. “We were able to see a comprehensive landscape of how much of that medicine was flowing through all hospitals globally over the past 12 months and avoid sites that were not yet prescribing the medication at scale,” says Glass.

Addressing knowledge gaps in trial results

Many clinical trials, on their own, don’t have a sufficient sample size to use advanced data-analytics techniques to conclusively identify patterns or signals, but could be used for hypothesis generation,” says Terri Madison, PhD, MPH, senior vice president and general manager, evidence and access, Certara. “However, we have several examples where we are either combining indication-specific data across trials, or across trials plus RWD sources, and then using AI, simulation and other advanced analytics techniques to strengthen understanding of a disease and the potential impact of therapy.”

By way of example, Madison notes that in one rare, serious genetic disorder, AI has helped to identify factors associated with earlier disease onset and rapid disease progression— critical information that can then be used to fine-tune future trial inclusion/exclusion criteria and help in targeting trial participants who have the factors associated with rapid disease progression. “Such efforts would theoretically lower the sample size requirements (due to a larger anticipated effect size) or could reduce study duration (by enriching the trial with patients anticipated to have worse/accelerated disease-related outcomes),” says Madison. “This could also enable accelerated approval in certain patient subgroups, thus strengthening the investment via earlier commercialization of the therapy.”

Model-based meta-analysis

Today, model-based meta-analysis (MBMA) is a popular meta-analytic technique that incorporates pharmacological principles (such as dose-response) and outcome changes over time to yield longitudinal data. It is being used to synthesize rich aggregate datasets from prior clinical trial publications into more easily understood predictive models. MBMA, experts say, can be used to better define efficacy and safety targets that are needed to achieve differentiation, says Zierhut of Certara.

Specifically, according to Certara’s Zierhut, MBMA is helping drug developers to:

  • Reduce trial size and trial duration (lowering costs).
  • Increase treatment effect precision at same costs (increasing confidence).
  • Enable more precise assessment of relative effects for a new drug versus key competitors without needing a risky and costly head-to-head trial.
  • Decide whether to advance or kill their own investigational drugs—before needing to gather definitive data from expensive Phase III trials.

By analyzing consolidated insights from past clinical trials, stakeholders can also convert inherent uncertainty into statistical risk that can better support decisions about resource allocation and portfolio optimization,” adds Talwai of Medidata Acorn AI.

AiCure’s Christie believes the biggest factor to truly moving the needle in using predictive analytics to advance clinical and commercial objectives is to be able to do it at scale.

Making use of unstructured data in EHR systems

While mountains of data are collected throughout routine healthcare activities today, extracting the most relevant information is not that straightforward. Today, by some industry estimates, unstructured data constitutes up to 80% of information available in EHR systems.

“Natural language processing (NLP) can help users to understand the unstructured data in medical records and thus help investigators to truly understand a patient’s health status,” says Shroff of Verantos. For example, it is difficult to look at the structured portion of the medical record and understand if a patient suffering from asthma has symptoms consistent with severe asthma over time. “However, NLP can inspect the unstructured data and help make that determination,” he says, adding: “An ML algorithm can also detect comorbidities based on a number of data points, and a deep learning algorithm can predict patient outcomes based on extensive training and validation.”

“When similar variables are collected for multiple patients across geographies, the data quickly becomes complex and abundant,” adds Kulkarni of CRB. “Handling such a large number of variables and deciphering patters is not a trivial task.” He says the data-reduction technique known as principal component analysis (PCA) is helping current pursuits. “The literature is rich with applications of PCA to identify common or underlying root causes in clinical settings, as well as for defining pricing strategies,” says Kulkarni.

Creating new opportunities for existing drugs

One particularly lucrative opportunity for data analytics in pharma is the ability to mine existing RWD related to both on- and off-label prescribing, and earlier trial data—and identify clinical findings that may allow existing drugs to be reevaluated as treatment options for entirely new clinical indications. This can be done sometimes in different patient subpopulations within the same disease category, and other times, in entirely new therapeutic spaces.

In October, a study funded by the National Institutes of Health (NIH)3 found that a commonly available oral diuretic pill (bumetanide) may be a potential candidate for an Alzheimer’s treatment for those who have a particular genetic profile. Specifically, NIH researchers looked at EHR data sets from more than five million people and split them into two groups—adults over 65 who display a certain genetic signature of interest (a form of apolipoprotein E gene called APOE4) and took bumetanide, and a matching group who did not take the oral diuretic. The analysis showed that those who had the genetic risk and took bumetanide had a 35%-to-75% lower prevalence of Alzheimer’s disease compared to those not taking the drug, according to NIH. Ongoing studies are underway.

Meanwhile, in 2019, Pfizer made headlines when it received FDA approval for Ibrance (palbociclib; in combination with an aromatase inhibitor or fulvestrant) as a first-line treatment option for men with hormone receptor-positive/human epidermal growth factor receptor 2-negative (HR+/HER2) metastatic breast cancer. The novelty of this additional indication was that regulatory approval was based solely on studies of EHR and other post-marketing data—IQVIA Insurance database, Flatiron Health Breast Cancer database and the Pfizer global safety database—related to male patients who had the same form of biomarker-directed breast cancer, and were being treated with Ibrance off-label due to an unmet medical need and a lack of alternative treatment options. No additional clinical trials were carried out.

UK company Healx focuses on using AI methodologies (via its Healnet AI drug-discovery platform) to help identify novel therapy options and combinations of existing drugs that can help patients with rare diseases. In October, Healx received investigational new drug (IND) approval from the FDA,4 along with an orphan-drug designation, for the Phase IIa clinical study of HLX-0201 (initially approved as a non-steroidal anti-inflammatory drug) for the treatment of Fragile X syndrome—the world’s leading inherited cause of autism and learning difficulties, for which there are currently no treatment options available.

HLX-0201 was identified as a potential treatment for Fragile X syndrome by Healx’s omic-based drug-matching methods, “which compare the gene expression profile for a disease with the gene-expression profiles from Healx’s curated drug database to find entirely novel connections and disease pathways between the two,” explains the company. Several other compounds identified by Healx’s AI methods are also progressing toward the clinic, with the ultimate aim of finding a combination with synergistic mechanisms of action. Recruitment for participants in a clinical trial is set to begin in the coming months.

The power and promise of predictive modeling

Thanks to the availability of vast data sources and novel data analytics tools, predictive modeling and simulations are also gaining traction among drug developers, to investigate a variety of queries. “Predictive modeling based on RWD can be used to make decisions at different levels—prescriptive analytics, predictive analytics and business-intelligence analytics,” says Kulkarni. “The type of data that are required varies according to the objective—for instance, in some cases, the analysis requires wide data (ample data related to a sufficiently large population); in other cases, deep data (a significant amount of data about one patient) is required.”

So-called artificial neural network (NN) learning models also “do a particularly good job of memorizing, generalizing and generating recommendations, as long as the available data is sufficiently wide and deep, adds Kulkarni. “Such models allow the user to enter certain inputs and contexts i.e., input to the EHR notes in natural language), and the models produce outputs in a hierarchical manner [as actionable recommendations],” he explains. “The wide and deep model when ‘memorizing’ will rely on frequency of certain keywords, features and the underlying ‘intent’, and use linear model (i.e., logistic regression) to determine an output.”

One promising application for using simulations is to test the validity of tapping data coming from various wearables/digital health monitors—as surrogates of expensive and labor-intensive, gold-standard measures for the same endpoints, notes Madison of Certara, adding: “This could potentially save millions of dollars on future trials, if the right wearable proves to be as good as the current gold-standard measure for gathering key data during the trial.”

Simulations based on RWD are also being used to evaluate what is the optimal treatment sequencing in crowded therapeutic spaces such as multiple sclerosis, where knowing which early/first-line therapies, or the optimal sequencing of therapies, can produce the best overall prediction of long-term delay in progression, “This could have a huge impact for patients, and for payer decision-making,” says Madison.

Creating insights that matter to payers

Increasingly today, there is growing focus on tying drug pricing and formulary decision-making to value—and in terms of actual clinical outcomes in real-world settings, not just early findings produced in the trial. When data analytics efforts are able to demonstrate better outcomes for different patient populations, “the two data sets become the basis for conducting drug-pricing negotiations,” says Shroff of Verantos. “In the future, we expect to see real-world evidence being more rigorously produced in line with recent FDA guidance and becoming the basis for value-based contracting arrangements between pharma and payers.”

Syneos Health’s Jimenez believes the ability to study genetics and genomics information to better differentiate likely responders from non-responders has the potential to not just inform study design and trial enrollment but shorten the duration of the study, and thus the costs; reduce patient burden; and speed time to market to address unmet clinical need. “And payers are keenly interested in this approach, because it is in their best interest to pay for products that will have a higher probability of clinical success in specific patients,” he says.

“The question isn’t whether a solid data-analytics strategy can support commercialization efforts—it is an absolute necessity, because without a doubt, the real-world situation is not as neat and tidy as one might expect if we were to read the treatment guidelines or listen in on the latest advisory board meeting,” adds Casciano of Certara. “Our development programs and pivotal trials only give us a very narrow view of the heterogeneous treatment context.”

Thus, Casciano continues: “It is essential to be able to take what we have learned from our clinical research and apply it into real-world situation with all of its nuance. This is an impossibility without the proper insights from a well-executed RWD program, and, frankly, no responsible payer would give favorable access to a new treatment without an assessment of the likely consequences in their population.”

It is important to understand whether certain real-world patients that will get the treatment will be in any appreciable way different from those studied in the clinical trial programs. “if so, does this mean we should expect better or lesser results when compared to the trials?” says Casciano. “Without the answers to these kinds of questions, entering into a complex pricing or reimbursement arrangement would be nothing more than a leap of faith.”

Pfizer is so confident in the real-world capabilities of its lung cancer therapy Xalkori (criotinib), for patients with metastatic non-small cell lung cancer whose tumors are anaplastic lymphoma kinase (ALK) or ROS1 positive, that in October, the pharma giant announced a new outcomes-based contracting arrangement called the Pfizer Pledge Warranty Program.5 Through the program, the company will refund eligible patients for out-of-pocket expenses, and will reimburse payers (including private insurance, employer-sponsored health plans or Medicare Part D) if patients suspend their prescription (due to clinical reasons) before the fourth 30-day supply is dispensed. Refunds to the payer plan will be up to the cost of the first three bottles (30-day supply) of Xalkori—a maximum of $19,144 for each bottle, or an aggregate maximum of up to $57,432—according to the company.

“Similar to drug-pricing negotiations, payers are looking for evidence that a new therapy delivers additional benefits to their members before including the therapy in their formulary,” says Shroff. “Pharma can support these discussions by measuring outcomes of a cohort over time and comparing them against outcomes of another cohort which is phenotypically similar but on different treatments.”

Similarly, while formulary structures can be an effective way to help payers control spending, they don’t serve all patients equally well. Consider specific patients (or categories of patients) who don’t benefit from the top-tier formulary therapies, but are forced to cycle through them before getting to a more ideal treatment option.

“This adds costs and stress, and delays can impact quality of life, worsen disease progression and engender wasted costs,” says Talwai of Medidata Acorn AI. “Insights from RWD and pooled clinical trial data can play an invaluable role here in identifying those patient subpopulations for whom preferred agents, drug classes and/or mechanisms of actions may have inferior efficacy or safety profiles.” This combined intelligence, she says, can thus lighten the burden on patients and healthcare providers serving these underserved treatment segments.

“At the end of the day, data-analytics efforts should be used to bridge the endpoints from clinical trials to real-world value and to help address remaining knowledge gaps,” adds Christie of AiCure.

When it comes to big data in the pharma and life sciences industry, the old adage applies: Go big or go home.