Long-term Storage of Chromatographic Data

Adrian Fergus

Assuring Compliance and Accessibility
The U.S. FDA has recognized the practical problems surrounding the long-term archive of electronic records of chromatographic data and other analytical data in its recently-published Guidance for Maintenance of Electronic Records Docket 00D-1539, Guidance for Industry on 21 CFR Part 11. The draft guidance, released on the public register on September 5th, 2002, focuses on the maintenance of electronic records, 'required retention period' and record migration, though it does reaffirm the importance of acquisition and processing of data in ensuring the integrity of the record, such as that generated through chromatography data systems (CDS). To quote the docket: "If information is inaccurately or incompletely recorded, record maintenance practices will not compensate for these shortcomings." This article interprets the new guidance, its implications for vendors and operators of CDS and looks at a solution for the long-term archival and retrieval of chromatographic data based on the migration approach to maintaining electronic records.

The draft guidance sees the FDA propose two potential approaches to data migration, a short-term solution and a long-term solution. Obviously, any short-term solution must ultimately roll over into the long-term solution due to the requirement of making data accessible for many decades.

The short-term approach is termed the "Time Capsule" approach. It involves preserving the exact computing environment hardware and software, including computer and operating system, as well as the vendor application software. Preserving the computing environment is obviously expensive and impractical, even impossible, for the long-term. Perhaps a more significant issue with this approach is that it can be an inhibitor to laboratories benefiting from advances in computing technology. It requires that the same software revision (and the associated operating system) be maintained for the life span of the records that it acquired. Chromatography is a key analytical technique and generates a significant proportion of a laboratory's analytical data. System users have come to expect software that is designed to evolve with the market and are keen to take advantage of future software upgrades and the latest functionality. Remaining on old technology for the sake of compliance would ultimately negate the majority of the benefits of purchasing a CDS and could affect a laboratory's profitability.

To address this obvious flaw in electronic record keeping, the FDA proposes a true long-term solution as the "Data Migration" approach. In the data migration approach, the instrument data is translated into a format that allows the data to be migrated forward as the computer and software technologies change. A much more viable alternative; it allows the laboratory to move with technology and still ensure compliance with 21 CFR Part 11.

CDS functionality to support compliance
It is a mistake to simply address the issues of accuracy and completeness solely through archival of electronic records. The guidance document indicates that companies are advised not to neglect how their CDS handle and track acquired results. It is clear that the guidance document, like 21 CFR Part 11 itself, applies to the life span of the data and its associated metadata, referred to in the document as its 'record retention period,' Certain criteria must be met to ensure that data integrity is maintained and this can only be handled by the CDS that generated the data.

To take advantage of the data migration model, features must exist in the CDS to ensure data integrity throughout the migration of the system. Section 11.10(b) states "The ability to generate accurate and complete copies of records in both human readable and electronic form suitable for inspection, review, and copying by the agency." In addition, Section 11.10 (e) states "Audit trail records must be retained for at least as long as that required for the corresponding electronic record." Modern CDS incorporate fully-featured audit trail facilities that are permanently linked to the electronic records or log events to either the event log or to a specified file. Ideally, audit trails should be inseparable from the data itself to ensure compliance. Features like built-in copy verification mechanisms and audit trails can ensure compliance throughout the record retention period.

Flowchart illustrating the conversion of archived GAML files of data (originating from disparate CDS) into a CFSML.xml file, which can be interpreted, visualized and compared within Atlas

The guidance emphasizes the importance of ensuring data integrity throughout the copy process. If a system does not have "a built-in copy verification mechanism, such as cyclic redundancy check (CRC)" the copy process itself must be validated to ensure that data integrity is maintained. System validation is often cumbersome, but the use of CRCs removes the onus from the CDS user as the system already has ways of ensuring data integrity is assured. In leading CDS available today, cyclic redundancy checking is automated on significant files to detect illegal modifications. For example, the CDS will detect changes made using an external application, such as Microsoft Notepad or Microsoft Word. The data file is then flagged as corrupted and therefore unusable if the check finds an illegal modification.

The subject of altered copies is addressed in Section of the guidance. These will be compliant with 21 CFR Part 11 provided that all alterations are documented and compensations are made. The use of color codes is an example given in the guidance. For instance, an electronic record in an old CDS used a specific color for its chromatograms and accompanying text, and a replacement CDS can not replicate that color, as it uses different colors to represent chromatographic data. In order to ensure that a reviewer or auditor could correctly interpret the information, the guideline document suggests that a new electronic record should be created to supplement the migrated electronic record, which explains the correlation between the old and new color representations. While this solution would comply with 21 CFR Part 11, it would appear to be potentially error-prone. A better approach would be to ensure that an accurate and complete representation of the electronic record can be migrated and archived along with the actual chromatographic data in its original format, which can be verified as secure at source. This constitutes a much simpler and secure method of maintaining migrated electronic records.

What data to migrate?
While the data migration approach is clearly the appropriate long-term solution, it does raise issues as to exactly how and what to translate the data into. Exactly what information from the original file must be migrated forward? The other key issue is that the chosen file format must lend itself to be platform-independent.

Providing the means to easily search, data-mine and retrieve any piece of chromatographic data from an archive for inspection, visualization and reprocessing is a formidable objective. This requires that the archival process saves and catalogues an accurate and complete record of the entire CDS workbook, including raw data, processed data and results, methods, reports, calibration details and audit data. An ideal solution for the data migration approach would ensure that metadata information (e.g. date/time created, owner, sample name) is extracted from the data and stored in the database to allow easy search and retrieval of the right data in the future.

Data conversion for data migration
As a file format standard for the migration of electronic records, XML (eXtensible markup language) appears to be the most promising. The attributes of XML are well documented elsewhere but suffice it to say the fact it is public domain and platform-neutral makes it attractive. It has certainly gained favor in a number of recent FDA initiatives and has been adopted by a number of recognized standard bodies, not only as a data interchange format but for data storage too. Quite understandably, the need for "yet another standard" has been questioned. A number of standardized data interchange formats such as JCAMP and AnDI already exist for transferring laboratory instrument data sets between software products. However, when viewed as potential formats for long-term storage and distributed access, these have serious deficiencies. This is in no way the fault of the design and implementation of these formats, but rather the narrow role for which they were designed. In order to address this situation, some organizations have opted to not use the existing data storage standards at all, but instead store graphical representations of a final report. However, again, there are serious shortcomings with this approach and the FDA recognizes this. The following is a section from DRUG GMP Report (FDANews.Com monthly journal Issue 113, December 2001, p7):

"Motise said the FDA thought that PDF file formats did not permit the processing of record information and thus would be problematic. PDF is a static format that does not allow reviewers to manipulate data to view subsets or generate analyses, tables or graphs.PDF formats had been supported under earlier FDA policies because they were difficult to alter and were widely used within government. However, the agency is now touting the extensible markup language (XML), which is a more dynamic format."

Furthermore, in section of the new draft guidance, FDA has asked that the ability to process information in electronic records (possible in accurate and complete records held as XML files) should be preserved using the data migration approach. To quote:

"...the new computer system should enable you to search, sort and process information in the migrate electronic record at least at the same level as what you could attain in the old system (even though the new system may employ different hardware and software)."

GAML or generalized analytical markup language
Same chromatogram as a GAML file copy viewed in eRecordManager archival solution

With the backdrop of industry and regulatory support for XML, an XML schema known has GAML (generalized analytical markup language) has been proposed for the 'normalization' of analytical data. The normalization of various CDS file formats into a common GAML format, now possible using the latest conversion technology (as available in archive solutions such as eRecordManager of Thermo LabSystems, UK), is significant news for chromatographers and IS managers administering CDS. For the first time, it will enable the comparison and re-processing of archived chromatography data from disparate CDS using a single application. This could eliminate the necessity to retain and maintain legacy CDS, associated hardware, and operating system, in order to access archives of data usually stored in proprietary, binary file formats.

An example of data import of GAML files of CDS data
Using version 2002 release 2 of Atlas (Thermo LabSystems, UK), it is possible to import chromatographic data, acquired from other CDS, provided it has been previously archived as a GAML file. A user achieves this by checking the original data source (i.e. the CDS) of an archived GAML file from within Atlas and selecting a suitable XSL (XML stylesheet language) format to transform it into an XML schema that Atlas can fully interpret. This CFSML (chromatography file system markup language) file can then be entered into an Atlas workbook and accessed. Provided file converters are available, users are able to view GAML files of data from other CDS from within Atlas. With regards to reprocessing of data however, this does require the development and use of Atlas methods, though it does allow the user to view the results, see baseline positions, and so forth. Perhaps most importantly, it also allows the laboratory to react more efficiently and confidently to auditor requests to see specific traces and peaks.

The next step in the development of this functionality is to enable the retrieval of instrument data from other CDS, along with all its associated metadata and method details, from within Atlas. This is being approached as each file converter is developed. When complete, this will represent a truly accurate and complete representation of the electronic record (of entire workbooks from all leading CDS), as required by the FDA in its proposed data migration approach to long-term electronic record maintenance.

With potentially millions of chromatographic analyses and components, from a multitude of instruments, projects and users, it is imperative that archived data and information is searchable in ways meaningful to each organization. To achieve this goal, customer-defined metadata is required to describe archived records and for storage in the archive solution database. This then allows complex search queries to be utilized in order to easily find that data, which may have been collected 20 or more years ago.

In modern labs, CDS are one of the most prolific generators of analytical raw data, methods and results. The value inherent in chromatographic data is extremely high in that it is expensive to create and the information content is high. It is vital in confirming compound identity, purity and quality.In considering potential vendors, operators of CDS should take into account how systems address the issues the FDA's latest draft guidance on 21 CFR Part 11 raises. Likewise, vendors need to become familiar with the document to ensure systems ultimately support the customer in achieving compliance.

There may be many advantages for industry if XML, through a schema such as GAML, becomes an industry standard file format for analytical data. It will ease compliant data migration in line with the FDA's guidance on maintaining electronic records. It will also make it much easier for companies to mine, share and compare historical chromatographic data, allowing companies to capitalize on the intellectual knowledge that may have been locked away in archives for years.

Adrian Fergus is employed by International Marketing at Thermo LabSystems. He may be contacted at