Preservation in the cloud? A first look at ‘Preservica’

6 juli 2012 Gepubliceerd door 3 Commentaren

On 5 July, Tessella Technology & Consulting organized a webinar to present their brand-new “Preservation as a Service” called Preservica. I attended because I was curious about this latest development by a company that has made a clear choice to invest a lot in archiving in the coming years – as it has done in the past. Here are my impressions – by Jeffrey van der Hoeven, KB/National Library of the Netherlands

Jeffrey van der Hoeven

As far as known Preservica is the first archiving solution that has gone into the cloud offering the same type of capabilities as Tessella’s current Safety Deposit Box (SDB) solution for stand-alone archiving systems. However, this cloud solution is said to be more flexible and doesn’t require an ICT infrastructure on site, as everything can be managed remotely.

The software specific to the archiving functionality is developed by Tessella while the underlying storage and computing infrastructure is delivered by Amazon’s cloud service. In one hour, the Tessella team gave a nice insight into the capabilities and workflows of Preservica and even dared to demonstrate it live to us.

In essence, Preservica works as follows: you ingest files by just uploading them to Preservica. The file is automatically identified using the Droid tool (figure 1). This can be done in a batch process as well.

Figure 1. Source: Tessella

After that, a policy is attached to the file containing information about what should be done before storing the file for the long term. A technical registry keeps track of what should be done with a file in a particular format. For example, a format migration might be required (figure 2).

Figure 2. Source: Tessella

For each package of information a SIP (Submission Information Package) is created. This can be done using a predefined template or from scratch in which case you need to enter metadata for that specific kind of package. In this way, you can make collection-specific SIP files with their own metadata.

If all parameters are set, the ingest can start via an authenticated session. A whole set of automated steps are taken which is very nicely shown in this overview (figure 3).

Figure 3. Source: Tessella

All uploaded SIP’s can be browsed using their build-in file browser. It seems to me that this is a practical solution for small collections, but it will become difficult to navigate if we talk about millions of objects. Nevertheless, it is available and delivers a basic front-end for your data.

Preservation plans & actions

One of the key features of Preservica is its ability to carry out preservation actions. This means that you can configure the system to respond in certain situations where data might become obsolete. To do so, you need to compile a preservation plan which can be scheduled to execute on certain sets of data (selected by file format, collection type, etc.). Tessella demonstrated this by converting JPEG2000 files into PDF/A on the fly which worked nicely (figure 4).

Figure 4. Source: Tessella

In conclusion

Overall, I think Tessella has created a good opportunity for small and mid-sized organisations to upgrade their long-term preservation activities by safeguarding their digital collections in a way that is much better than the way they are currently held. As such organisations may lack a scalable technical infrastructure, they now can use Preservica for preservation purposes. No other commercial vendor that I know of offers a similar out-of-the-box solution.

However, some words of caution are in order as well. As with any cloud solution, one should be aware of the risks as identified in a previous post by me. I confronted Tessella with my doubts about Amazon as a storage provider. In 2009, Amazon was struck by a major technical failure which not only resulted in a period of downtime, but even resulted in data disappearing forever!

Tessella is of the opinion that Amazon is stable and secure now and that such events are very unlikely to happen again in the future. But for long-term preservation, “the future” is very, very long …

Then there are some legal issues. As Amazon is US-based, all data stored in Preservica automatically are uploaded to US servers – under an American rather than a European legal regime.

Finally, once data are stored in Preservica, you cannot get them out easily. For now, there are no easy ways to migrate large amounts of content to other platforms if there are any. As long as you are happy with Preservica this is no problem, of course, but if you run out of budget or foresee the need to get the data downloaded very quickly I advice you to make a special arrangement beforehand.

Disclaimer. This post contains the personal opinions of the author and does not in any way reflect official positions of the NCDD or any of its member organizations.

