Data standards promise easier collaboration, data exchange and interoperability in the informatics world
This article takes a deeper look at the Analytical Information Markup Language (AnIML), an emerging open format for scientific data, which is gaining momentum in the community as the standard is nearing completion.
One of the challenges in laboratory data management is the handling and exchange of experiment data. Many vendors provide excellent instruments, but most instruments produce data in their own proprietary formats. This leads to major difficulties for data processing, collaboration, instrument integration and archiving. There is little choice in software, as users are often tied to the tools that came with the instrument. The ASTM AnIML standardization effort addresses these problems by providing a neutral XML-based format for exchanging scientific data.
Started in 2003, AnIML has been developed by the ASTM E13.15 subcommittee on analytical data. This group brought together stakeholders from instrument vendors, end user organizations, government agencies and academia to ensure the format would be complete and widely applicable.
Specialist or generalist standardization approaches
AnIML is not the first effort to standardize a data format. Excellent work has been done by the makers of ANDI,1 JCAMP-DX,2 SpectroML3 or mzML.4 Such previous initiatives have targeted specific analytical techniques and have achieved certain levels of adoption in their particular field.
The creators of AnIML have chosen a broader approach. They designed AnIML to accommodate arbitrary analytical techniques. A well-defined growth path enables adding new techniques in the future without changes to existing software, allowing organizations to consolidate their data management efforts and protecting investments into AnIML technology.
This flexibility is achieved using a simple approach: A generic data container, called the AnIML Core, allows storing of arbitrary scientific data. On top of that, so-called Technique Definitions define how to use the data container for a given analytical discipline. A Technique Definition can be thought of as a catalog of the data fields needed to record an experiment of a given technique. Technique definitions are regular XML files, so new ones can be created at any time.
Cross-technique data representation
Today, AnIML can handle data from all well-known and frequently-used techniques, including spectroscopy, chromatography, imaging, bioassays and others. However, it is also possible to use it for custom or one-off experiments, microfluidic chips or special sensors, making these techniques first-class citizens in a data system. Over time, new analytical techniques and their corresponding Technique Definitions will be developed. This generic approach allows a system to use them without requiring modifications or software upgrades.
If you dive into the specification, you notice that AnIML is based on plain XML. This lowers the effort to adoption, because a large number of tools is readily available to work with the format. XML is text-based. Accordingly, it is possible to read AnIML documents in a simple text editor — without the need for specific software. While not necessarily convenient, this feature is critical for long-term data retention scenarios: even if we lose access to the software, we keep access to our data.
Inside an AnIML document
Let’s dive into an AnIML document. Once you understand the building blocks, it is easy to find your way around.
The basic currency in AnIML is the Sample and the Experiment Step. An Experiment Step could be a UV/Vis spectrum, a chromatogram trace, a peak table or a microplate read. Each Experiment Step describes how a particular part of a laboratory workflow was performed, so it contains method and result information. It also indicates which Technique Definition was used and points to the Sample that was analyzed. Experiment Steps also can be linked. For example, a peak table step could be linked to the chromatogram step from which it was derived. Using these two primitives, Experiment Steps and Samples, you can build up fairly complex laboratory workflows if needed.
Applying AnIML to real world problems
Once we manage to convert all our analytical data into the same format, a number of common lab informatics challenges become much easier to address. Adopting a standard such as AnIML doesn’t come for free, however. You will need conversion tools, a viewer, integration tools and a little know-how to get started. Putting together the business case starts with identifying a small scope scenario and evaluating AnIML against a traditional proprietary approach. This allows you to quantify the effort for either approach. Here are some use cases where standards-based solutions yield tangible benefits:
LIMS and ELN integration: Having data in AnIML format makes it easier to move it into other data systems, such as LIMS and ELN. No longer do we need a separate interface for each instrument type. By normalizing the data, a single interface can be used to pull in data from arbitrary instruments. This reduction of required interfaces reduces integration costs.
Collaboration: In many industries, organizations frequently collaborate with internal and external partners. Agreeing on a standard electronic deliverable makes it easier to exchange data with service labs and contract research organizations. Unlike traditional Excel spreadsheets and PDF reports, AnIML allows us to transmit full analytical data sets. This leads to increased data quality and boosts the downstream usability of partner data.
Long-term data retention: Especially in regulated environments, retaining access to analytical data for several decades is a requirement. It is not realistic to expect that we will be able to preserve our original software applications for such extreme periods of time. Converting the data to AnIML may be a way out. It reduces the number of software tools that need to be maintained, drastically reducing the total cost of ownership of a data system.
Data analysis and reporting: Often, acquiring and processing analytical data is only the first step. We are seeing increased adoption of data-driven workflows. These dive deep into the raw data and apply visualization, design of experiments, multivariate analysis and other statistics tools to the data sets. Providing data to such tools becomes much easier with a normalized data structure like AnIML.
AnIML took a long time to build. A number of technical and political challenges had to be addressed to satisfy the requirements of all stakeholders. The technical work was finished over a year ago, offering implementors a stable basis for adoption. We are now finalizing the standard documentation and submitting it to the balloting process at ASTM. This will turn AnIML into an open and public standard.
In the mean time, much effort has gone into building the necessary tools to deploy the standard. For end users, this includes viewers for the desktop, the Web and mobile platforms, integration tools, data converters, and others. For the vendor community, there are several software components which allow embedding AnIML functionality into their existing software.
Up to now, AnIML has been applied primarily in the pharmaceutical and environmental field. We are looking forward to exploring other use cases and seeing the community around this emerging standard grow.
• AnIML Home Page: www.animl.org
• BSSN Software AnIML Resource Center: www.bssn-software.com/animl
1. ASTM E1947 – 98(2009) Standard Specification for Analytical Data Interchange Protocol for Chromatographic Data, DOI: 10.1520/E1947-98R09
2. McDonald, R.S.; Wilks, P.A. Applied Spectroscopy 1988, 42, 151-162; Davies, A.N.; Lampen, P. Applied Spectroscopy 1993, 47, 1093-1099; Lampen, P.; Hillig, H.; Davies, A.N.; Linscheid, M. Applied Spectroscopy 1994, 48, 1545-1552
3. Rühl, M.A.; Kramer, G; Schäfer, R: Spectro ML-A Markup Language for Molecular Spectrometry Data, Journal of Laboratory Automation December 2001 vol. 6 no. 6 76-82
4. Martens L et al: mzML — a Community Standard for Mass Spectrometry Data, Mol Cell Proteomics January 2011; 10(1): R110.000133, DOI: 10.1074/mcp.R110.000133
Burkhard Schaefer is the lead architect of the AnIML format and co-founder of BSSN Software, an AnIML tools company. He can be reached at firstname.lastname@example.org.