Sometimes Data is Best Served Cooked, Rather than Raw: Scholarly Publishing and the Web of Data
There is a great deal of interest in the sciences and humanities around how to manage "data." By "data," we usually refer to content that has some formal and logical structure needed to meet the requirements of software processing. Data quality issues play an important role in shaping professional incentives to participate in data dissemination and in issues of trust and reliability around the use of shared data. As in the case with other areas of scholarly production, researchers need appropriate workflows to edit, review, and improve the quality of shared data. This presentation will explore how the transactional nature of data helps shape this workflow. Because use of data is heavily mediated by software, datasets can be seen as an integral part of software. This thinking motivated us to experiment with using software debugging and issue tracking tools to help organize collaborative work on editing data. Debugging and issue tracking tools, widely used to improve software quality, can play a similar role in the "debugging" of data.
Finally, such editorial workflows need to also take into account issues of context. To be more useful, datasets need to be understood and related to other information available on the Web. This is particularly true for archaeology, an inherently multidisciplinary domain with inputs from the humanities, history, and natural sciences. Beyond the research community, much information relevant to archaeology is routinely collected through government administrative processes relating to environmental impact regulations and historical preservation laws. "Linked Open Data" methods can help to better contextual research data both with other datasets, other forms of scholarly communications, and records maintained by government institutions.
Eric Kansa is director of the Open Context project, a service for the publication of research data in archaeology. Open Context works closely with the California Digital Library for long-term archiving and curation of digital data.