• nl
  • en

The wiggly wobbly jelly: predicting costs of digital archiving #StF12 (4)

28 mei 2012 Gepubliceerd door 1 Commentaar

“Thanks for coming to the session about the stuff no-one wants to hear about,” was David Rosenthal’s word of welcome to the first masterclass of day two of Screening the Future 2012, on the costs of storage and preservation. Well, David, perhaps the audience was more in the mood for poetry and optimism, but there comes a time when our bosses and funders do want to know about the flip side of the coin. The organizers had invited an excellent set of speakers: David Rosenthal of LOCKSS, Stephen Abrams of the University of California Curation Center (UC3), and Matthew Addis from the University of Southampton IT Innovation and PrestoCentre. – by Inge Angevaare

From the left, Rosenthal, Abrams, Addis

All three have done extensive work on cost modeling – which basically means that you design formulas with variables in them (which you can change, according to changing circumstances like interest rates, but also according to your policies and management decisions) to predict your future costs. This is, of course, very difficult. It was Addis who used the term wiggly wobbly jelly when talking about predicting the future – especially in a quickly evolving field like technology. Exact figures cannot be given, if only because circumstances and policy choices vary greatly.

The three presentations were absolutely packed with information, so by all means watch the complete video recording which will appear on the conference website in due course. Here are some highlights:

There are different approaches to cost modeling, depending on the circumstances and the questions you’re asking.

Preservation is much more than just storage

Stephen Abrams represented the point of view of a “vendor”, a service provider for storage and curation, the University of California Digital Library (UCDL)’s Curation Center (UC3) (slides here). His question is how to set prices for the services he provides. “People have funny notions about what things cost,” he said. “I have to explain to them why they should spend $16,000 [=total cost of ownership] on storing a Terabyte of data when they can buy a 1 Terabyte hard drive at the corner store for $100.” Abrams gave his audience some current market prices and because everybody always wants figures I will show them here. But be aware that they do not mean much, as there is no indication what you get for your money. Most of them just offer storage of one type or another.

The difference between these, of course, is in the service level, the liability. Curation and preservation are about so much more than mere storage:

The total cost of preservation can be calculated like this:

End of story.

Or is it? Abrams acknowledged that it is difficult to attribute concrete figures to the variables. “Our accounting practices were not designed for preservation.” Nevertheless, he said:

An important part of Abram’s work is identifying types of costs: which are fixed, which are variable; which costs are incurred at the beginning of the lifecycle, which are incurred later. These will enable UC3 to develop different payment models for different clients. Institutions may want a “pay-as-you-go” model whereas researchers may want a “pay-once-preserve-forever” payment model, which will enable them to factor the costs of data management and preservation into project grants. More details about UC3’s work here.

PrestoCentre’s motto

Preservation strategies for limited budgets

Matthew Addis of the University of Southampton is involved in a PrestoCentre project that poses a different question: how can I balance my limited budget against the risks of losing data? After all, as Abrams pointed out, we are experiencing “boom or bust” budget cycles – or rather, “bust and buster” budgets, and we have to spend our scarce money wisely. (Thanks, Matthew, for giving me your slides.)

In a single slide, Addis showed that there is a lot more to preservation than just putting data on a disk. He compressed 20 years of data management into one minute:

20 years of data management compressed into one minute …

Adding media (disk, tape) for new data is the most frequent activity you will have to perform – 500 times per second! Checking the integrity of your data is a less frequent “must” for keeping your data safe, but you still have to do it every three seconds. Etcetera.

Here’s an interesting comparison between different storage alternatives, their advantages and disadvantages:

Addis’s models plot all sorts of preservation actions against risks of data loss to find optimum solutions. I also learned a new word: “scrubbing” which means running a fixity check and (automatically) repairing small errors (see Addis’s blog post Wash day for your data). Roughly: the more you scrub, the lower the risk of data loss, but the higher the cost. You can try out the financial planning tools yourself at http://prestoprime.it-innovation.soton.ac.uk/ (they’re not easy, though).

Matthew Addis balancing risks against costs

In conclusion two more slides from Addis that are worth your attention. The first is about storage costs in comparison to access costs:

And here is the good news:


Does long-term cloud storage make economic sense?

One can always trust David Rosenthal to keep us on our toes by challenging assumptions that everybody takes for granted. He did so in 2009 when he argued that obsolescence of file formats was much less of a problem than everybody had thought so far. This time he looked at the economics of cloud storage (mind you, without saying anything about the cloud’s suitability for long-term preservation). Is cloud storage getting cheaper, the way everybody thinks?

More details about this presentation at Rosenthal’s blog. Here are a few conclusions:

  • The cost of cloud storage is not going down by 30% annually, but rather by about 3%.
  • Prices of cloud storage are not really based on costs, but on market value and Amazon’s dominant market position.
  • Storage costs are not just about the cost of storage media, but about a lot of other factors, such as energy consumption, labor, bandwidth charges.
  • When we look at large quantities of data, moving them from one medium/storage service to another becomes so time-consuming that you will think twice before doing this.
  • Engineers that are trying to build higher-density disks are running into physical limitations that will affect their success rates (see PASIG)

Rosenthal presented two new possible models for projecting costs (see blog) that I am not qualified to have an opinion about just yet, so I will leave those for now. His last slide read: “The future does not look good,” because of the technical issues mentioned above in combination with the fact that industry predicts insatiable demand (which raises prices) and the fact that IT budgets do not keep pace with storage demands.

When all projections fail, we still have selection to curb costs

The uncertainties in the presentations prompted Tate Gallery’s Pip Laurenson to ask the all-important question: “How can I possibly plan for preservation financially?” The masterclass teachers were unable to give Pip concrete advise, because much depends on organizations’ policies. This prompted Paul Conway of the University of Michigan to point to a way to manage costs that had not been described during the masterclass: “Selection. The records management approach – save less material for shorter time periods.” If you are uncertain about the future value of certain materials, you may want to decide to keep it for 10 years and then re-evaluate whether you want to go on preserving it.

“That will be a dramatic change for the movie industry,” someone noted, “but it must be done.”





Categorie :Geen categorie

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *

Translate »