The power of strategic data standardization

Published on: 
Pharmaceutical Commerce, Pharmaceutical Commerce - September/October 2017,

Global efforts to collect clinical data in a more standardized form will lead to more research innovation

The pharmaceutical


is deluged with data. However, because data is seldom standardized, companies cannot fully tap its potential to substantially reduce the failure rates in drug development. [1] By consciously streamlining and unifying data, drugmakers can improve efficiency throughout the four stages of clinical development—from design and start-up through execution and submission.

As a case in point, during the clinical development design phase, companies can integrate data from multiple sources to build a clear patient profile. Social media, patient focus groups and disease advocacy groups are all excellent sources of potentially valuable data. For endpoint selection, pertinent data sources may include similar trials, basic research, regulatory filings and payer data. Dose selection can also be better met with data from preclinical studies, population data and other sources that would allow successful simulations and predictions of dose response curves.

In our experience, it is important for companies to leverage both their internal, as well external, data. This is something that we routinely do when we optimize protocols—both from a design, as well as an execution perspective. To enable the above, we have created a metadata repository (MDR), which applies standardization and traceability from protocol generation (e.g., endpoints, visit structure), through data collection, mapping, statistical analysis and reporting, on through preparation of the clinical study report (CSR) (see graphic). Conventionally, the various individuals and teams responsible for these activities start from scratch to write their own sections of protocol, develop the electronic database and program the statistical analysis. This process can be cumbersome and has the potential to cause issues down the line. Our MDR approach allows us to standardize and expedite these tasks through automation, ensuring consistency in language and data collection.

An example of how metadata standardization works (and an illustration of data complexity overall) can be seen in this example: the collection of gender information. Every clinical study collects this information in the demography domain. The clinical data behind this concept would be “male,” “female,” “unknown,” and “undifferentiated.” This gender information for a given subject will be collected in different places by different people and systems for a given study (e.g. clinical data base, central laboratory, randomization system, specialty lab, clinical management system, etc.). The metadata for this item should follow the established industry Clinical Data Interchange Standards Consortium (CDISC) Standards, with “sex” as an item name and the values “M” and “F” for male and female. However, in many cases even in this very simple example we see different data sources using metadata such as “sex,” “gender” and actual clinical data such as “male, female,” ”M, F,” ”man, woman,” “0, 1,” “1, 2” or other variations. When these different data sources are then being put together, an enormous amount of mapping through programming needs to be put in place to allow to combine data. The complexity level increases for other examples, such as laboratory data with different test codes.

Common protocols

Research organizations around the globe are increasingly advocating and implementing data standardization. In 2012, the collaborative non-profit organization, TransCelerate Biopharma, introduced a Common Protocol Template (CPT) to provide a foundation for electronic protocols. [2] In addition, the US Food and Drug Administration (FDA) and the National Institutes of Health (NIH) last month announced availability of their own joint template, including sample language and instructions, to help researchers prepare Phase II and III clinical trial protocol. [3] Many fresh efforts to harness disparate data sets are likely to influence the conduct and design of clinical research going forward. The 21st Century Cures Act, passed by Congress several months ago, is a case in point. It may enable drugmakers requesting regulatory review of experimental treatments to bypass full clinical trials and instead submit “data summaries” and “real-world evidence,” such as observational studies and insurance claims data. [4] For its part, the Patient-Centered Outcomes Research Institute (PCORI) seeks to enhance engagement of clinical research stakeholders, including patients, caregivers and prescribing physicians. As PCORI and other groups fund investigations of highly complex issues, such as the best means of managing symptoms of end-stage illness across disease categories, standardization of data and methods for analyzing it will be key. [5]

Around the globe, it's clear that data-oriented research efforts will soon afford drugmakers access to data truly meaningful for clinical development. Some of the most visible include the United Kingdom's Clinical Practice Research Datalink (CPRD), the European Commission's eHealth Interoperability Framework, and the Structured Data Capture (SDC) Initiative. Guidance issued by FDA [6] and the European Medicines Agency (EMA) [7] also make plain that electronic health records (EHRs) will have a growing impact on healthcare and research.

Companies can harness standardized data during the start-up phase of development. Examples include site enrollment and performance on previous trials, epidemiological data, etc. Critical to unlocking insights from these standardized data sources is expertise with data storage architecture, ability to search across different data structures, data visualization capabilities, predictive analytics and more.

In the clinical development execution phase, standardized operational data can help companies save time and money by making it easier to determine whether the data being gathered from patients are in line with trial objectives. As a best practice, organizations should leverage a statistical monitoring approach to help data withstand regulatory scrutiny. Visual presentations of scientific data should be geared towards people with specific responsibilities and skills, such as data management, clinical, medical and statistical team members.

Wearable technology

Beyond data processing, technological advances in data capture are also changing what's possible in the execution of clinical trials. Wearable devices equipped with sensors are swiftly becoming viable means for collecting clinical data from human patients in a real-world setting. Eventually, data captured via wearable sensor technology might be taken directly from a person's electronic health record and into databases established for hybrid studies and/or site-less clinical trials.

Data standardization is already making a big impact on the final steps of clinical drug development—analysis and submission—and its importance will only increase. Specific data formats are now widely used, including Clinical Data Acquisition Standards Harmonization (CDASH), for data collection standards, the Study Data Tabulation Model (SDTM), for submission of clinical and non-clinical data to FDA. The Analysis Data Model (ADaM), an increasingly popular description of dataset and metadata standards, is guiding the creation of powerful, flexible structures for generating, analyzing and replicating clinical data. CDASH, SDTM and ADaM are championed by the CDISC, an international consortium well-regarded for its work to “let data speak the same language.” In addition, genomics and proteomics technologies dependent on massive data sets are not only helping drugmakers discover fresh targets, but also increasing their ability to predict and identify specific subgroups of patients likely to respond to treatment.

In May, FDA approved Keytruda (pembrolizumab) for patients that screen positive for a particular tumor biomarker. [8] The drug was approved for “microsatellite instability-high” or “mismatch repair-deficient” solid tumors, based on a study involving just 149 patients. Ninety had colorectal cancer, while the remaining 59 had one of 14 other tumor types. This is the first time a cancer treatment not linked to a tumor's original location in a particular tissue type has been approved.

Keytruda's approval, in connection with the drug's mode of action (influencing the immune system), rather than a specific tumor type, is a prime example of the power of strategic data standardization. The drug's developer was able to find individuals across a host of cohorts with a specific molecular identifier who would respond to this compound.

Gathering and organizing data for submission to FDA or other regulators is often considered a daunting task. Companies can, however, simplify the process by defining the data needed for submission during trial design. With this, protocol would automatically drive the database setup. Findings would be collected and analyzed in consistent ways and results


would flow more smoothly into the clinical study report.

The tremendous heterogeneity of healthcare information systems remains a key difficulty. It's a challenge with deep roots that pre-date the digital era: as just one example, there may be as many as 50 descriptions for myocardial infarction. Yet recent years have also seen genuine progress towards data standardization.

Companies should consider leveraging an outsourced partner early on to assist with gathering, storing and analyzing data. Many vital decisions made at the onset of trial design determine how each successive step of the development journey proceeds. Getting it right from the start can help organizations save time, mitigate risk and improve the overall clinical development process.


  • Biotechnology Industry Organization study, February 2011
  • Marks, P. (2017). FDA and NIH Release Final Template for Clinical Trial Protocols. FDA Voice, May 2, 2017. (
  • 114th Congress, Public Law 255, Act: To accelerate the discovery, development, and delivery of 21st century cures, and for other purposes. US Government Publishing Office. December 13, 2016.
  • Patient-Centered Outcomes Research Institute (
  • Use of Electronic Health Record Data in Clinical Investigations: Guidance for Industry, Draft Guidance. U.S. Department of Health and Human Services Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), Center for Devices and Radiological Health (CDRH), May 2016.
  • European Medicines Agency, News and press releases (2017). Use of big data to improve human and animal health: Task force to establish roadmap and recommendations for use of big data in assessment of medicines. March 23, 2017.
  • Martins, I. (2017). FDA Approves Merck’s Keytruda for Colorectal, Other Solid Cancers with Specific Genetic Feature. Colon Cancer News Today. May 25, 2017.


Sy Pretorius, MD, serves as senior vice president and chief scientific officer at Parexel, where Martin Roessner, MS, is vice president, global biostatistics; Michelle Hoiseth serves as vice president, Parexel Access; and Michael Goedde is vice president, clinical data management and database programming.