Thinking Beyond ELN
A look at the informatics architecture model 

Thinking Beyond ELN
click to enlarge

Figure 1: Formulation analysis data flow
Over the course of the last three years, electronic laboratory notebook (ELN) products have expanded their core capabilities to include functionality not envisaged in the early days of the “paper replacement” ELN.  With the introduction of LIMS-like features, such as structured data management, request management, reporting, task workflow automation and new application modules, ELN is offering benefits across a wide range of laboratory types.1 Commonly, the ELN is configured to the needs of departments to gain efficiencies at the laboratory bench. This domain-specificity offers the scientists a streamlined workflow to capture, annotate and report data and information tailored to their individual requirements.

However, this domain specificity is leading to additional problems, often not foreseen during project conception. The most challenging of these is a lack of cross-departmental information integration to provide new insights that are difficult (or impossible) to manually observe across individual documents. Biopharmaceutical R&D departments routinely tune their ELN to individual workflow needs with a report (or PowerPoint) as the information deliverable. Most often, this is the same report that was created before ELN; now, it is just created faster and more easily. The vocabulary of the report and its format are departmental, with little consideration of the impact on data aggregation at a higher level. This phenomenon is occurring in organizations that employ multiple vendor ELN products, as well as those who only use a single vendor offering. Having a single product does not assure transportability if the configuration and terminology are department-centric.

Another problem with a departmental view is long-term records retention and management. In the paper world, organizations have (or should have) defined processes for cataloging notebooks, supplemental binders and project information. Naturally, ELN amplifies the number of electronic records that must be archived for retention. But, there is also an increasing use of project team PowerPoints, e-mails and other documents containing intellectual property and institutional knowledge. These files are e-mailed throughout the enterprise; they are rarely controlled or archived. Despite some vendor claims, ELN is not architected to be a long-term record archive, as its primary purpose is a transaction system for experiment documentation and laboratory process execution.

Thinking Beyond ELN
click to enlarge

Figure 2: The levels of the informatics architecture model
In the early days of ELN, where systems were mainly deployed in early discovery research and medicinal chemistry, the problem of cross-departmental sharing was not as great. But, as ELN penetrates laboratories further downstream (e.g., drug metabolism, analytical, formulations, etcetera), data are increasingly structured. Departments must cooperate in a bidirectional data flow of requests and delivery. Information from various groups must be brought together at the project or compound level for decisions of development and continuing investment. The lack of integration is exposed; manual aggregation and manipulation are required, thereby slowing compound development. This requires new ways of optimizing macro level R&D information processes, not just those of individual groups and generators.

To illustrate this, let us examine the use of ELN in early pharmaceutical discovery versus downstream development. Before ELN, medicinal chemists documented their experiments using paper lab notebooks. What was missing was a method to design and reuse chemical reactions, easily share experiment knowledge with others, and to replace the drudgery of manual tasks like stoichiometry calculations. The synthetic chemistry ELN offers those capabilities (and more) to enhance the chemist’s workflow. Instead of documenting experiments in the paper notebook, they are done in the ELN, where other chemists can access and reuse them. After electronic signature, the chemical product detail can be posted to a chemical registration system and the experiment is locked into a PDF format for preservation.

Other than what is maintained in the chemical registration system, the downstream biologists, formulators and analytical chemists have little need to access details of the reaction. They may access the structure, molecule weight, formula and so forth, for documentation purposes, but not the details of the actual experiment itself or the spectroscopic results. The output of the department is static and is used little in downstream departments’ workflow or decisions.

Thinking Beyond ELN
click to enlarge

Figure 3: An analytical laboratory described using the informatics architecture model
Now, consider the data management challenge of a pharmaceutical formulator who is charged with the development of dosage forms for clinical trials. Studies involve the creation of many different formulation batches and forms using multiple lots of the active pharmaceutical ingredient (API), an assortment of different excipients, compositions, processes, dosage forms and packaging types. Illustrated in Figure 1, during the course of development, samples from a formulation batch are sent for analytical testing (for short term experimental stability, dissolution, content uniformity, etcetera), pharmacokinetic (PK) in vivo studies (for bioavailability), and hardness, density, particle size and other in-process tests. Lot-specific chemical characterization data on the API, e.g., particle size, thermal analysis, x-ray diffraction, will be provided to the formulator from the process group. With a poorly soluble API or demanding composition, there can be multiple formulation batches, each requiring its own loop through the process, creating a rather sizable data collection to be consolidated and analyzed.

The formulations, analytical and PK departments are using ELN in different ways to address their unique requirements. The analytical ELN is used to document the development and validation of new methods and the execution of procedures. The ELN is integrated with a LIMS — used for tracking, workflow enactment and samples/results management — passing values such as sample weights and standard concentrations. LIMS produces reports for the formulator, which may be e-mailed or posted to a collaboration site. The biologists in pharmacokinetics use the ELN for the design and execution of their animal studies, including creation of the study protocol and dose schedules. PK analysis and visualization tools may be provided by the ELN or integrated to the framework. Bioavailability results are provided to the formulator in a spreadsheet, PowerPoint or report format. The formulator’s ELN is oriented quite differently, recording the design of formulation batches, processes and dosage forms, and automating calculations, recording in-process testing results, and the creation of batch records.

Thinking Beyond ELN
click to enlarge

Figure 4: Example informatics architecture
Despite the efficiency gains afforded by ELN in the individual departments, a process bottleneck exists without consideration of how data are tied together at the formulation batch, API or API lot level. To optimize formulation design, large datasets must be analyzed such as dissolution (from analytical) versus bioavailability (from PK), API particle size (from process) versus bioavailability, and particle size of the formulation (from formulations) versus chemical stability (from analytical). The manual consolidation of data from reports, presentations and spreadsheets is arduous and time-consuming; it is further exasperated by the departments who insist on providing data using their own terminology definitions and formats. Without an integrated perspective, finding new patterns through mining is not possible, requiring unnecessary levels of manual intervention and analysis, slowing down the overall time to the clinic.

There must be a balance between the desires to increase efficiency within departments and the macro-level requirements of the organization to expedite R&D. A level of standardization is required which does not restrict flexibility in the deployment of informatics at the lab level, but enables information sharing across operating groups.

Informatics architecture model
Working with several large biopharmaceutical companies, we developed a reference architecture, separating essential functional capabilities into four discrete levels. This informatics architecture model was specifically designed to segment the needs of the individual domains and application areas from that of R&D as a whole. As shown in Figure 2, the scope of the reference architecture moves from the functional laboratory perspective to higher levels supporting cross department segments, eventually leading to the R&D enterprise. A high level of detail (e.g., individual experiments) is maintained at the lower level, while summary information and knowledge are distributed to the higher levels.

At the highest point in the stack is Knowledge Utilization. At this level, tools exploit the contained knowledge and information maintained in the structured and unstructured repositories below it. The exact technologies are dependent on the needs of the organization. They can range from enterprise search, data mining, portals and XML servers, to business intelligence to query, aggregate and describe content from the various repositories in the knowledge retention level.

Knowledge Retention crosses multiple domain technologies to integrate information by projects, molecular entities, formulations or other entities. Retained knowledge can be in the form of structured assay results and compound data in a warehouse and/or as unstructured records (e.g., documents, PowerPoints, images) preserved in an enterprise content management (ECM) solution. Intellectual property records, regulatory documents, standard operating procedures, training records and other content are maintained for record retention, warehousing, archiving and lifecycle management.

The systems in Laboratory Process Execution are those interconnected systems supporting the requirements and workflow of individual laboratories and departments. These systems manage the process from experiment design to execution, from raw data collection through information creation, but are implemented to address the workflow needs of a particular domain. Technologies such as LIMS, ELN, CDS and SDMS exist in this level.

Figure 3 shows an example of the analytical laboratory where technologies are coupled to support the laboratory’s requirements, yet select datasets are posted to searchable repositories above.

There also are underlying databases and systems that span departments, different in scope from those at the knowledge retention level. These systems provide consistency in reference data to enforce a common language between systems in the upper levels, enabling communication. An example is a compound registration system. After registering a new chemical entity, the system returns a compound identification number. This number, and information like formula and mass, does not change over the entity’s life-span and is therefore static. When compounds are sent to high throughput screening (HTS) or drug metabolism, the compound number is used as a consistent identifier.

There are other attributes like project numbers, studies, formulation identifiers and others, which are required for effective integration across information silos. Without a common language, consolidation and searching at the knowledge retention and utilization levels are extremely difficult. A search engine cannot “reason” the multitude of descriptors used uniquely by each department. If one group uses the concept of a “study” and another “protocol” to mean the same thing, then this has to be programmed into the logic of the engine. Further complicating matters, if the formats of attributes are quite different, a near infinitesimal number of permutations must be described. This is not only unrealistic, but also impossible to maintain over time.

This need for common reference information across systems introduces another level beneath the laboratory process execution layer. This is known as Master Data Management (MDM) and provides a logical process of managing and describing reference data for systems used across the different laboratories. Master Data Management is the “discipline that focuses on the management of reference or master data that is shared by several disparate IT systems and groups.”2 It aids in establishing a common electronic record organization and description to simplify the requirements of patent protection, knowledge preservation, utilization and compliance.

Reference data are those elements used for classification and definition of records, comprising a standard list of possible choices for all the systems in the upper levels. For example, the definition of the term “project” with a master list of all currently available project numbers provides consistency. To create a new project number, a process of governance assures the validity of the definition.

In addition, analytical data types can be defined by MDM. A consistent use of significant figures and units for results and how assays are defined, for example, improves the quality and confidence of results. A governance process for assay definitions also can be considered under MDM; it is not uncommon to find a large organization with different definitions for an assay referring to the same set of results.

In the example depicted by Figure 4, there are systems in the master data management layer for metadata standards, chemical and biological registration, a controlled vocabulary, and a project database. The controlled vocabulary provides the user with the company’s standard metadata descriptors for tagging a notebook record in an ELN template. There are distinct ELN and LIMS systems in level two to address the requirements of departments. A subset of structured data from the laboratory systems are posted to a warehouse. Documents, along with their metadata tags, are posted to the enterprise content management system.  Enterprise search and business intelligence applications — accessed through a portal — merge content from the unstructured and structured knowledge repositories. Naturally, this is a simplistic example of the use of the informatics architecture model. The design is dependent on the size of the organization, legacy systems, workflow requirements, repurposing needs and individual philosophies.

In summary, when planning an informatics project for an individual department, the needs of the information providers and consumers must be considered in a value chain analysis. Examining cross-departmental use cases helps to identify the level of standardization, integration and functional capabilities required. Maintaining a vision of the reference informatics architecture model allows organizations to separate departmental requirements from the needs of the enterprise, maintaining flexibility to meet scientific demands. This flexibility must be balanced with a level of standardization to enable integration of structured and unstructured data for macro level R&D benefits. Master Data Management requirements must be taken into consideration early in the project to allow not only information integration for meeting today’s challenges but tomorrow’s as well.

1. Michael H Elliott, “What You Should Know Before Selecting an ELN: Electronic Laboratory Notebooks Have Evolved into Four Distinct Types,” Scientific Computing, June 2009

Michael H. Elliott is CEO of Atrium Research & Consulting. He may be reached at