Grid-based Digital Libraries and Cheshire3
Recent research in designing and developing digital library services has been focused on approaches to indexing and searching in a steadily increasing range of genres and materials. An important aspect of this research is concerned with providing effective and scalable IR services for digital libraries as these diverse collections grow to sizes measured in terabytes and petabytes. The Cheshire project has had a central research focus on large-scale digital library collections for more than a decade, with a current focus on supporting distributed digital libraries in a Grid evironment. At the same time we have have been prototyping systems for very long-term digital preservation, and examining how grid-scale information retrieval systems can interoperate with petabytes of diverse data stored over many years.
In order for Information Retrieval (IR) in the evolving "Grid" parallel distributed computing environment to work effectively, there must be a single flexible and extensible series of "Grid Services" with identifiable objects and a known API to handle the IR functions needed for Digital Libraries or other retrieval tasks. The Cheshire3 system builds on the work of the Cheshire project over the past decade to define and implement an easy to use set of IR objects with precisely defined roles that can effectively provide a Grid Service for IR. I will discuss how distributed storage technologies like the SRB (Storage Resource Broker) are being used in Cheshire3, and the issues of efficiency in such a computing environment. (This talk is based on recent submissions to SIGIR and to INFOSCALE).
Student Presentations:
Libby Smith. "Mass Collection Digitization: Keeping Resources Trusted." Standards have been developed for trusted digital repositories. How do these attributes apply to mass digitization projects conducted by a commercial third party such as Google? Examined are such issues as quality, storage, and, most importantly, persistence of access. In other words, in the case of repositories like UCB, how can Google be trusted to provide an accurate and accessible collection forever?
Yun Kyun Jung. "Reasons for Voluntary Information Sharing in Korean Cyberspace: The Uses and Gratification Approach." I will study the various motivations and reasons for voluntary user participation and information sharing among groups of Koreans who share similar interests in cyberspace. As the primary method to analyze this topic I have chosen the "Uses and Gratifications" approach developed by communication researchers such as Katz and Blumler. In the traditional study of media, the main object of study is the media. The Uses and Gratifications approach, developed in the 1970s, starts with people and explores how people use a certain media in order to achieve their needs. This theory was not designed for Internet media and may not apply well to the culture of Korean cyber society, and there might be other factors that are not covered by this theory. I will also look for reasons that have not yet been documented.