I call them “homeless” data – categories of information that are important for researchers, but traditional libraries and archives have trouble fitting them into their workflows: websites, e-mails, social media, virtual worlds, games. The organizers of the LIBER curation workshop had invited Eric T. Meyer of the Oxford Internet Institute to plead the case for web content and social media, and he did so with vigour (slides here).
Meyer spoke from the perspective of a researcher who is not getting what he needs from current web archives. The InternetArchive and various national libraries are capturing static websites, more or less as pictures, but the huge amounts of data that are exchanged through new social media consist of interactive, dynamic content that is very difficult to harvest. To illustrate this, Meyer showed a graphic illustration of a simple Facebook transaction (click on image to enlarge it).
What makes harvesting complicated is the fact that there is no one manifestation of the content – every user sees his own personal website. Technically, libraries are still unable to archive all this in a manner that can be useful to researchers. The Mellon Foundation recently awarded the LOCKSS consortium a research grant to start looking into these mechanisms, but it will be some time before such research yields practical results.
Meyer stressed that not even researchers are fully aware of the wealth of information on the web and on all social media that are replacing the web. Just imagine what sociologists could do if they were able to monitor web traffic itself or if psychologists were to gain access to all the transactional data going on in gaming.
Is there a role for research libraries in all of this? “Libraries have become very successful at disappearing into the background”, Meyer observed. “They should get their stuff much more to the front.” This requires active and early participation by users; the researchers are essential partners in this. And it requires domain expertise.
Meyer’s recommendations for dealing with the Web in a Changing World:
- Immediate: Set up ways for researchers to trigger collection, increased frequency of crawls, etc.
- Developing: Use triggers, like RSS, to find and archive changing content, particularly emerging fast-changing content.
- Long-term: Develop algorithms that follow trends to trigger archiving.
Meyer also stressed: “There is a change from looking at past pages, to looking at collections, and collections of collections. There is a shift towards data analysis rather than reference; the data set rather than a reference collection; e-research, data sharing, linked data, API’s.” If you want to know more, check out Meyer’s slides and the recent reports mentioned therein.
To my mind, all of this implies a drastic change for research libraries – from looking at and selecting traces from the past to looking at a live, constantly changing environment, and picking and choosing content to preserve and provide access to long before it has proven its value to research. This will require fundamentally different selection techniques, as will be discussed in my next post from the LIBER workshop.
Reactions from the audience included a comment by William Kilbride (DPC): “I think in 10 or 20 years mainstream digital preservation will be about web archiving. That worries me.”
Barbara Sierman of the KB found out at last week’s meeting of the International Internet Preservation Coalition in Washington that many researchers do their own bit of web archiving. Unfortunately, these collections are not shared.
Eric Meyer’s concluding comment: “More and more of what we do is happening on the web.”
Gecategoriseerd in :Geen categorie