Michael H. ElliottThe rapid pace of biologics growth has left companies in informatics catch-up

The rapid increase of investment in biotherapeutics is changing the profile of the biopharmaceutical industry and, along with it, data management in the laboratory. With attention on longer patent life, high barriers to generic equivalents and personalized medicine, an increasing portion of R&D spending is being allocated to large molecule therapies, such as monoclonal antibodies (mAb). This comes at the expense of investment on chemically synthesized small molecules. From 2005 to 2008, there were 73 new molecular entities (NME i.e., small molecule) approved by the FDA versus only 10 biologics. From 2009 to 2012, however, 24 biologics were approved versus 71 NMEs.1 This growth will continue for the foreseeable future, as will investments in combination biopharmaceuticals, such as antibody drug conjugates that link biologics to small molecules.

Despite some residual use of paper notebooks, informatics for small molecule pharmaceutical R&D is relatively mature in comparison to that for biologics. For many years, systems such as modeling, chemical registration, inventory management, and electronic laboratory notebooks have been well-established. In medicinal chemistry alone, over 60 percent of pharmaceutical chemists use an electronic laboratory notebook (ELN) on a daily basis.2

The rapid pace at which biologics has grown has left most companies in informatics catch-up. Project teams commonly start in small capital restrained biotechs or as isolated groups in a large pharmaceutical company using Excel as the standard data management tool. With attention on getting a product out the door versus worrying about what comes next, it is not unusual to find data practices and formats that vary widely from scientist to scientist. Naming conventions are rarely standardized: cell lines and protein batches may have names that mean something only to the person who created them. With the influx of new projects — and without unlimited resources — the impact of poor data management practices can have a negative impact on efficiency, cycle time and quality.

Biologics is a far greater data management challenge than that for small molecules. A chemically synthesized compound’s process is fairly linear: from synthesis it flows quickly into tests for in vitro pharmacology, pharmacokinetics, etcetera. In contrast, for a series of assay and study results on a single molecular entity, just the process of creating a monoclonal antibody (mAb) generates a multi-dimensional set of data requiring analysis. This is because each protein is unique based on the cell from which it was expressed. The conditions in which it is grown and purified can impact efficacy and product quality, so much more data needs to be collected and analyzed versus a new chemical entity.

Figure 1: Example mAb workflow with select assaysIn the simple example in Figure 1, a vector is transfected into multiple wells of cells, each requiring a series of tests such as cell viability and western blot. Data are analyzed for the selection of positive colonies for transfer to expression shake flask experiments, each with their own set of varied media and conditions. Many additional tests, such as Enzyme-linked immunosorbent assay (ELISA), are performed to determine which cells may progress into small-scale bioreactor experiments. These reactors have time-based feed concentration data (e.g., glucose, lactate, etcetera) and on-line analytical measurements (e.g., dissolved oxygen, cell viability, temperature, pH, etcetera) which need to examined in near-real time. Samples are taken on a regular basis during the course of the experiment for off-line analysis, e.g., titer by HPLC. From there, multi-step purification experiments are executed to optimize yield and purity that include even more sophisticated analytical and viral testing. In all, a substantial accumulation of data before one even gets to pharmacological analyses.

This is creating opportunities and challenges alike for ELN suppliers. Despite the overall economic climate, the market appears to be strong and growing. At PerkinElmer (PE), a longtime leader in the chemistry space, Chris Strassel, Informatics Product Manager says the company has “seen a huge growth in demand” in biotherapeutics. Paul Denny-Gouldson, Vice President Strategic Innovations at IDBS, a leader in biology ELN, says the company now has “a number of significant deals across various market verticals that deal with the general area of biologics.”

The capabilities an ELN must have are unlike those needed for chemistry or in vivo biology. Denny-Gouldson says the major difference between chemical and biological entities is that “how you made it” is as important as the “what it is.” “Treating them as simple entities that have a set of properties as in the small molecule world is not appropriate. The genealogy of how you get to an end entity is critical, and this forms part of the major challenge in this area — how to capture all the context and how alongside the other elements without affecting the users workflow.” Strassel adds, “The workflows in the chemistry space are mature and clearly defined, with biologics there seems to be a great deal of variability. The challenges become not just being able to capture the data and the workflows and build that into the software, but to be able to accommodate the evolving workflows at the same time.”

The Problem of Dark Data
One of the complications with paper notebooks and Excel is inability to use data broadly across departments and project teams, or how one can examine the genealogy of a protein all the way back to the originating vector. For example, an antibody may be found not to have suitable “drug-like” properties in development. It is difficult — in some cases impossible — to fully trace data back to the original cell line to learn how approaches need to be modified to improve its characteristics or the “how you made it.” Experiment “dark data,” (i.e., data that are accumulated, archived and not broadly utilized) stored in notebooks, reports, bioreactor log files and spreadsheets scattered across poorly organized servers, could provide valuable insight into material optimization if it was both structured and integrated, as well as accessible. As an increasing number of automated approaches are employed, (e.g., cell automation, microscale reactors, pre-formulation screening, robotics for purification column optimization, etcetera) the volume of dark data is increasing exponentially.

Figure 2: IDBS BioBook showing cell line genealogyThe generic ELN “sticker book” approach does create a retrievable archive for some dark data but does little to exploit it for improving the science. At the recent IQPC “ELN, Data Analytics and Knowledge Management” summit in Boston, a speaker from a top 10 biopharmaceutical (who by company policy declined to be named) described their conversion from the “paper-on-glass” philosophy to a structured data management solution in their purification department. With the old approach, he indicated scientists spent 50 percent of their time manipulating spreadsheets despite having the ELN. After the transition to a structured methodology with the system supporting their workflow, they exceeded the 15 percent improvement in efficiency that was targeted. More importantly, however, they gained new understandings into their processes through bringing data out of the dark. The group now incorporates data captured by the ELN directly into their design of experiments (DoE) models, providing new scientific insights that have enriched their purification strategies. “You have to bring people closer to the data,” said the speaker. “You need to allow data to flow outward and upward.”

Strassel of PE says, “In this space in particular, it is very much process, driven — for the most part, the market seems to be looking for more workflow and data structure built into the notebook. It can be challenging because of this need for the process to be built-in in a structured “data management” way, but also for the flexibility to change and evolve as needed.” According to IDBS’ Denny-Gouldson, “As you move further along the development chain, the importance of good robust data management increases dramatically to the point where it becomes critical. We find most people see the benefits of the data management immediately and when they connect up the steps of the chain from molecular biology and cell line development onto fermentation and scale up to purification and stability they get the all important “holistic view” of the process and can get process insight in real time — not wait to the end to do post hoc analysis.”

Merck and Company was one of the companies that evolved the use of their ELN beyond their original “paperless lab” objective. Working with Merck, PE incorporated their BioAssay screening and RDLIMS sample tracking modules into their E-Notebook ELN and adapted them for structured biologics data capture, management and analysis. Merck’s objective was to “deploy a standard solution with integrated biologics-specific workflow capabilities” that “would enable results to be searched and shared.”3 Their executive director for vaccines/biologics R&D IT Dermot Barry Walsh noted, “Because it’s a structured database, we can more easily access and analyze the data than if it’s just attached as an Excel spreadsheet.”

Figure 3: PE E-Notebook with Cell Profiling TemplateBringing Data into the Light
One of the problems with many laboratory information management systems (LIMS) over the years has been a high level of customization for each customer’s deployment. Over time, this makes systems unwieldy, difficult to support, and painful to upgrade. As vendors work to incorporate new features to meet all the differences between clients, products have become bloated but infinitely configurable, resulting in complex implementations. There is a risk that ELN will follow the same path as users try to force the system to act just like tools with which they are familiar. This does not exploit other companies’ product improvements and may just increase dark data by mimicking the current state.

In our analysis of biologics discovery and development processes across many companies, it is remarkable how similar the workflows are, which should, in theory, make developing and deploying a standard platform straightforward. Due to the radical level of change involved, however, it is not that clear-cut; but vendors are attempting to provide standard offerings.

Denny-Gouldson says, “Some will take our off-the-shelf bioprocess solution and, with only a little bit of tweaking, they are happy, but this requires them to make some changes to their process. Other customers do not want to do this and request a full-individualized environment. This is not surprising, given that part of their IP is how they make product and optimize processes.” Strassel added, “The process is similar across organizations if not the same; mAb workflows are certainly consistent. The details that the customers want to capture tend to be different, so our approach is really to provide an out-of-the-box workflow that can be easily configured to adapt to the customer’s specific needs. For example, during a culture process, customers may have different time points to pull samples and different analyses they want to do on those samples, the cell line tracking is very similar but has a need for flexibility in the data that is captured and what is done with that data.”

Moving beyond dark data storage to a structured, workflow-oriented approach is by no means easy. It requires a level of agreement and standardization greater than most organizations are willing to tackle or can absorb in a short period of time. In many cases, it is culturally easier to start with the generic, sticker book approach. “In the research area, change is often the biggest barrier and it comes down to people, process and data flow,” said Denny-Gouldson. “How often do we hear: ‘My process is special,’ ‘my system is the only way to do something,’ ‘my work cannot be shared with anyone else,’ ‘we can’t write down our process,’ etcetera, all come up as barriers to change.” Strassel seems to agree when he says the chief obstacle to implementation is cultural transformation. “This is a barrier faced at any company switching from paper — or even siloed software programs — to one unified program. It is a mindset and culture change within a company that can be difficult for some scientists.”

The management of structured data was an important consideration in the ELN selection at Pfizer’s Chesterfield, MO, location. To support the needs of 500 BioTherapeutics Pharmaceutical Sciences (BTxPS) users, the company decided to work with their existing LIMS supplier, LabWare, to develop that company’s ELN for use in biologics development. LabWare LIMS already has a robust architecture for managing structured data and workflow capabilities that did not need to be created, but they did need to be configured for biologics. According to Julie Spirk, the BTxPS ELN system owner, “The functionality and the advantages that we achieve from implementing the ELN within our LIMS is having a single database for housing all BTxPS non-GMP and GMP data to facilitate reporting of data across the development life cycle.”

Realizing the vision of a single database across all operations would not be easy, the implementation team started by configuring workflow-supporting templates in analytical R&D. Bioprocess and Pharmaceutical R&D sciences use the work request functionality to submit samples to Analytical such that the results can be returned via the system. Outside of the request functionality, those groups operate the ELN with a paper-on-glass philosophy. Over 100 analytical templates are currently being rolled out with retirements of legacy systems along the way.

Just the experience of using the system as a generic platform is opening eyes to new possibilities in the bioprocess department. Spirk says they are “beating down our doors” for IT to customize templates for upgraded data organization and consistency across the group. But first the project team tasked the department to harmonize terminology and processes. She says this is often difficult, though, and may require consultative support from her team.

Spirk advises any potential users to never underestimate the ELN’s impact on culture and to make sure that management is fully behind the project. “Even people who see value may have issues with it due to the impact on their work,” she says. Spirk suggests deploying in a step-wise fashion, “Trying to do too much too fast is like trying to get a right-handed person writing with their left in one day.”

Marc Smith, Knowledge Management System Project Leader at Lonza’s Mammalian Development Services unit in Slough, UK, echoes Spirk when he says, “Jumping into the ELN needs to be managed. Trying to force users into a specific workflow too quickly can be for them like starting at a new company.”

Lonza is deploying IDBS’ E-Workbook/BioBook ELN to over 300 users from bioprocess to analytical, supporting both non-GMP and GMP workflows on the same instance. The project objective is to improve operational efficiency between 10 and 30 percent, depending on the area. These gains are accomplished through the introduction of templates, structured data management and a catalog of master data. Exploiting the BioBook catalog, Lonza creates unique cell line identifiers that are used throughout the system to support data genealogy, simplifying reporting.

“We do not see any savings with the sticker-book approach,” say Smith. “The templates offer us consistency, flagging, analysis, reporting, workflow and other capabilities that you would not get by just putting Excel sheets into the ELN.” But, Smith goes onto say, “Building templates to support GMP is pretty easy, as the processes are well-defined. It is a lot more difficult to try and lock down a design in the areas where the work is more fluid.”

Prospective users, Smith adds, should “Start early with the paper-on-glass concept and get people used to the system.” He says this is an easy transition for the users and, during this phase, the project team should conduct workshops to discuss template designs. “The efficiency gains will not be there, but people will understand the system better and know what bits they want and what bit they do not.” If the vendor or a consultant is helping you build your templates, it is important to have routine reviews of their work. “Requirements are difficult to write. Some consultants are good at challenging the scientists while others just take what is said verbatim.” He adds, when working with a supplier, to “make sure they see the goals the same way you do.”

Another user of IDBS, a top three biotech (who also by company policy cannot have their name published), has a different viewpoint.  The company, which has rolled out the system to over 600 users in non-GMP bioprocess development, has for the most part, deployed the system only as a generic paper replacement. “Tension will always increase as you try to change the work of 1000 people for the sake of 20 who data mine,” says the director who led the ELN deployment. “We looked at using templates in several areas, but did not find the expense would be justified to maintain them.” He goes on to say, “Sometimes you can do things better with Excel than trying to make the ELN do complex tasks. Instead of asking ‘if’ the product can do a certain thing, maybe it is better to ask ‘should it?”

This company justified the purchase of the ELN based on making it easier for their scientists to record what they are doing and to streamline the notebook signature process. According to the project lead, the changeover from paper notebooks was rather easy, as the technology was readily accepted. “If users see real benefits for them, they will sign up for it,” he says. The project team employed the “train the trainer” approach and conducts sessions once or twice a month to answer user questions.

That’s not to say the company does not find any value in the structured data management approach. He says in areas where processes and data structures are well defined, such as in process validation, they are looking at the possibility of building out a set of templates in the ELN.  In larger scale fermentation, the company wrote their own system for scheduling, inventory and a repository of reactor conditions, online tests, and results from offline assays.  Matlab and Statview are used for data analysis and reports are imported into the ELN. “It doesn’t make sense to try and force a generic solution to do something when you can customize a specific solution to a problem.”

Biologics are quickly gaining a larger share of the biopharmaceutical market. With increased investment, new technology and pressures to reduce cycle time and to expedite time to market, those in biologics R&D are facing increased stress to effectively manage both structured and unstructured data. Data management and workflow support through a modern ELN can help to alleviate that stress. However, a step-wise implementation is recommended, starting with a generic approach due to the level of disruption on the operation. How the system will evolve over time depends on a project’s strategic intent and an organization’s capacity to change.

2 Atrium Research & Consulting,
3 PerkinElmer, Inc.,” Collaboration Adds Biologics Capabilities to Ensemble for Biology,” July 2013

Michael Elliott is the founder, CEO and chief analyst at Atrium Research & Consulting. He may be reached at