All that Big Data Is Not Going to Manage Itself: Part Two
My last blog post described some of the federal government initiatives that have driven data management requirements over the past 10 years or so. “Data management” is a hot job area right now, and if you tilt the digital stewardship universe a certain direction, almost everything we do falls under the rubric of “data management.” Data management was addressed in the 2014 National Agenda for Digital Stewardship, and it’s going to feature prominently in the 2015 National Agenda, to be released in conjunction with the Digital Preservation 2014 meeting.
With that in mind, the second part of our two-part series looks at some of the innovative tools and services that have appeared to meet the data management needs created by the federal requirements.
The tools and services generally fall into one of two buckets: data management plan assistance or data repository services.
For researchers, securing the grant is always the #1 priority, so they’re heavily focused on making sure that they get a properly formatted Data Management Plan to the funding agency in the first place. Academic libraries are now being regularly enlisted to aid researchers in preparing and managing plans, so there’s quite a bit of support material out there.
A number of libraries offer general DMP guidance. An example is the University of Montana site that provides a list of some of the major funding agencies and a brief description of what their DMPs entail. NDSA partner MIT Libraries also provides a detailed set of resources, going so far as to suggest data formats for researchers that support long-term access, while NDSA partner the Inter-university Consortium for Political and Social Research has a site that includes numerous links to resources on preparing DMPs.
Some libraries have taken it farther, however, and have created tools to help researchers write their plans. For example, the Columbia University Libraries Information Services has links to DMP funder requirements and guidelines, but also has created DMP templates for National Science Foundation plans (down to the NSF division-level), as well as the National Institutes of Health, the National Endowment for the Humanities, the National Oceanic and Atmospheric Administration and the Institute of Museum and Library Services.
Beyond templates, several organizations have created tools to manage the DMP-creation process. NDSA member the University of California Curation Center of the California Digital Library has created the DMPTool, which has a new release coming on May 29. The DMPTool helps researchers with step-by-step instructions and guidance for data management plans and helps users create ready-to-use data management plans for specific funding agencies.
There are tools to create DMPs, and then there are the tools to manage the data itself, including institutional data repositories and the technologies to support them. One trend is for universities to create their own digital repositories. For example, Penn State University has ScholarSphere and PASDA (for geospatial data) while the University of North Texas has the UNT Data Repository.
Perhaps more common are discipline-specific repositories such as DataOne or ICPSR, the mere tip of an iceburg of discipline repositories. NIH has an extensive list of data-sharing repositories, while Databib provides a searchable catalog of research data repositories. And while most of the activity is driven by federal government grant cycles, data management requirements have begun to appear at the state government level, leading to the development of state data portals such as Colorado’s.
The universe of tools to support the preparation, management and preservation of digital data is ever-evolving, so it’s a challenge to keep on top of it. The previously mentioned DataOne project has compiled a lengthy list of software tools to support data management activities, though it’ll take a little hunting and pecking to identify the tools appropriate for the job you want to do. More specifically, the California Digital Library has their DataUp tool, useful for parsing a variety of structured data formats to detect the presence of potential data management issues, while the Australian Researcher Enabling Environment has created a suite of services specifically designed to help researchers get their data into a repository and manage it once it’s there.
If “data management” proves a boon to the preservation of all kinds of digital data, there’s still the question of how to sustain the activity over time. An article in the Chronicle of Higher Education by digital preservation pioneer Francine Berman suggest a number of economic models to sustain the stewardship of data collections, while the Sloan Foundation has provided funds to the Association of Research Libraries to collaboratively build a coordinated framework for the long-term management and preservation of the results of academic research. Only time will tell.
This is part two of a two-part series. Part one, from May 27, 2014 is available here.