Question: Digital preservation: fixity checks/virus scans

Two women operating ENIAC (U.S. Army Photo)

We are passing on a request for help that you might also have; and a few answers sent in to help. If you have other suggestions, please post them to the comments!

“I’m the archivist and part of a digital preservation team at a small Catholic academic library.  We have about 1500 full-time students.  Some colleagues and I attended a Digital POWRR preservation workshop last year (which I highly recommend if it makes its way to your area).  As part of our action plan to strengthen our digital file preservation methods and storage capabilities, we’re wondering what software/tools other similarly-sized academic libraries/archives are using for fixity checks and virus scans.

We learned about some tools at the workshop, but we’re not sure what’s best for our particular situation, and our campus IT department wants us to research what other benchmark institutions are using.  We are NOT looking for a complete software package that includes hosting our files.  Our files are hosted on the university server and will soon be backed up to cloud service.  We just need recommendations for fixity checks and virus scans.

If anyone is doing something similar, can you tell me more about what products you are using for fixity checks and virus scans, and if you recommend them?  Is there a one-time cost to implement the tools or do we need to allocate money annually for digital preservation?

Anything else you would like to share about digital preservation, including written preservation plans, would also be helpful.

Thank you in advance!

Catherine

Hi Catherine,

AVPreserve has a tool that might do part of what you want, and it’s called, appropriately enough, fixity:

https://www.avpreserve.com/tools/fixity/

This tool runs on Windows and Mac, and the source code is available so someone enterprising enough might be able to get it to work on linux as well.

As for antivirus, a lot of that will depend on how these files are being stored, what platform, etc. Would you be able to provide a little bit more info on what’s being used currently for long term storage and preservation?

**********

Hi Catherine,

A few years ago I did some similar research on fixity tools a few listservs and on the Digital Preservation Q&A Website (http://qanda.digipres.org/332/what-tools-do-you-use-for-the-ongoing-monitoring-of-checksums).  The Q&A site has a summary of responses I got and details about the different software options.  Hope that helps!
****************

Hi Catherine,

We have most recently been practicing a minimal effort curation practice
for our digital collections that uses a combination of Brunnhilde,
Siegfried, ClamAV and BagIt.

The first three tools aggregate a number of smaller applications and work
together to accomplish both a virus scan and some fixity information.
BagIt gives us additional fixity that we can audit as we move things
to/from various storage and staging points.

See the links below for those tools:

Brunnhilde: https://github.com/timothyryanwalsh/brunnhilde
Siegfried: http://www.itforarchivists.com/siegfried
ClamAV: https://www.clamav.net/
BagIt: https://github.com/LibraryOfCongress/bagit-java (specifically we
are using the bagit.py CLI here:
https://github.com/LibraryOfCongress/bagit-python)

We are also currently piping collections to cloud storage as an interim solution, and looking forward to developments with repository solutions like Hyku (Hydra-in-a-Box): http://hydrainabox.projecthydra.org/.

**********
Hi Catherine,
I think the best answer to your question would be:   Both! Knowing the type of files, their format and average size can be of importance, as some virus scanning tools (ClamAV being an example) will not fully scan files beyond a certain size, usually in the gigabytes. This is fine if you’re mostly preserving photos, digital documents or sound files, but moving images, born-digital project files and large research datasets can easily exceed these limits.
Having a general idea of the operating system and storage architecture/equipment being used will inform you as to what AV software will work best. If there’s a Linux underpinning, for example, I’d suggest ClamAV to start. A windows environment has other options more suitable for that OS, some of which are commercial/proprietary. And, your IT department might already have a solution they’re happy with, and it might be just fine for your needs, too.
****************

Hi Catherine:

We use ClamAV for virus checking.

In addition we use a product called Corz Checksum — http://corz.org/windows/software/checksum/  — it links in nicely to your windows explorer and allows you to do a point-and-right-mouse-click – he also has a *NIX version that you can run on your servers. It is also very cheap

There are other ways to do the checksum but this is probably the easiest option for Windows systems.