Increasingly, experimental data from biopharma research and development are siloed and stored in varying formats. This makes searching, retrieving, and sharing the stored data difficult. Furthermore, these legacy storage systems are often not interoperable, making the data stored in them impossible to compare. Unless data can be shared and compared, its full value is not being realized. Data systems need to be designed and managed to allow more collaboration in order to accelerate research and continue making breakthroughs. Data standardization aims to forge a path of machine-readable and actionable data so that systems can find, access, and reuse data with minimal human intervention.
What Is Standardization?
Data standardization is a process of bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies. Standardized data are essential to accurate analysis. Conclusions about a piece of data will be stronger when you have other data to compare it against. While standardization methods might vary depending on individual initiatives, data lose much of their original meaning when standardization has not been implemented.
Why Standardization?
Experimental data are often distributed among multiple software applications, databases, and even documents. This means experiments and measurements in many cases are rerun simply because the old data can’t be found or accessed. A lack of uniformity in data notation and storage can threaten the data’s integrity. In fact, some 42% of Good Manufacturing Practices Drug Warning Letters issued by the U.S. Food and Drug Administration in 2018 were for data integrity deficiencies.
Standardization, by making data collection automated and self-documenting and data collection systems interoperable, provides a multitude of benefits in terms of a reproducible process and reduced manual effort. Additionally, by providing the opportunity to look at similar substances that have already been analyzed, standardization increases retention of institutional knowledge, thereby reducing the time needed for method development and refinement, and improves the process for method execution.
Approaches to Standardization
There are many approaches to addressing the issue of data standardization. Basic methods include:
- Determine a common format before data collection — record the same type of data in the same format every time. For example, note liquid volumes in milliliters, or absorbance as A280
- Collect data based on preset standards — if pre-existing standards for how to measure and count a particular type of data are available, use them. In the U.S., for example, temperature is measured on the Fahrenheit scale
- Transform already collected data to a common format — standardize different data formats to just one format during data cleaning. For example, set all dates in an MM/DD/YYYY format
However, scientific data are significantly more complex than these examples provided. Furthermore, these data are often collected and stored in proprietary file formats, which limits the amount and extent to which they can be shared, integrated, and reused. These proprietary file formats impede the realization of data’s full value. An interoperability standard would obviate manual transcription as a workaround. But a new data interoperability standard requires a significant investment, whether it’s done in-house, by an instrument or software vendor, or through contracting with another company.
What’s Next
Biopharma organizations, in addition to regulatory agencies, have been working toward standardization for decades. Regulatory bodies are primarily driving this change, with the objective of standardizing the data exchange between these bodies and biopharma. Both parties (regulatory and pharmaceutical companies) want to be able to access, analyze, and effectively compare safety and efficacy data across clinical trials, drugs, and other treatments. Additionally, with healthcare service providers (insurance companies) becoming increasingly cost sensitive and the biotech/pharma industry becoming more competitive, there is a major drive toward increased comparative studies between various medications/treatments to justify price and reimbursement rates. Furthermore, there is the general public expectation that the drug companies should utilize data more efficiently and effectively in order to develop and manufacture more robust and cost-efficient medicines and treatments. As life science data continue toward digitalization, efficient standardization and integration of data for analytical purposes and regulatory submission will lead the next data wave in drug development.