by Carol Kussmann
Digital Preservation Analyst
Archives document the lives and activities of record creators. Records include correspondence, annual reports, project documentation, family papers, etc. The Archives and Special Collections (ASC) units at the University of Minnesota Libraries often talk about the size of a collection using cubic feet – or number of boxes. (1 box = 1.15 cubic feet) Some collections take up only part of a box while others fill 50-100 boxes. And that worked… for a while.
Today ASC continues to receive donations, however not all donations come in a box. Think about the types of records…
- Correspondence: How many of us still write letters? Not many. We use email.
- Annual Reports: How many organizations still print/publish their annual reports? Not many. Information is shared using a PDF document.
- Project Documentation: How many people rely on paper to document their project? Not many. Computers help us with this.
- Family Papers: How many of us still print our photographs? Not many. Images are on SD cards, in the cloud, and on our phones and computers.
Paper records will continue to exist and donated to the archives, however archivists now need to be able to work with electronic records and have a way to quantify them. Electronic materials come to the archives on flash drives, external hard drives, through email attachments, Dropbox or Google Drive. Until we open or view the storage device we have no idea how much ‘stuff’ is being offered for donation.
The storage space on commonly available flash drives range from 4 GB to 128 GB; 1 TB hard drives are commonly sold; and email and Dropbox can handle a variety of sized files as well. The amount of ‘stuff’ also depends on what types of files are being donated. Images are often larger than Word document files and audio and video are larger than images. The chart below documents how many files fit on an 8 GB memory card based on what type of image file is being created. The number of files ranges between 148-1000.
So the question remains… How do we know how much ‘stuff’ we have?
Recently the University of Minnesota Libraries conducted a preliminary inventory of digital content with long-term preservation needs. This included the records available through various repositories such as the University Digital Conservancy, UMedia Archive, and Minnesota Reflections as well as materials for which online access is not necessarily available such as digital materials within the units of the Archives and Special Collections.
This preliminary inventory allows the Libraries to better understand the number of files they have as well as the types of files and file formats in its holdings. Knowing this information assists with the long-term preservation of and access to these materials.
The inventory identified different file formats and the numbers of files per format. Reviewing the inventory the Archives and Special Collections materials are by far the most diverse by format – as expected – because the materials came from many different individuals over an extended period of time.
As a whole, the inventory identified over 6 million individual files; 32,000 of those are part of the Archives and Special Collections holdings. The 32,000 files take up 1769 Gigabytes (or almost 2 Terabytes) of space.
The following charts visualize the ASC collections first by number of files and second by how much of the total space each file format uses.
Viewing the first chart… Jpg files (images) make up 20% of the ASC collections with over 6000 individual files. This is followed closely by html files (webpage files – 5700), Word documents (4900), and Wav files (audio recordings – 4430).
When we review the holdings based on amount of total space, the picture changes…81% of the space is occupied by one format of video! 425 mov files take up 81% of the total space! The next three formats also represent audio and video formats however the number of files is drastically different.
- Mov = 425
- Wav = 4430
- Mp4 = 503
- Mp3 = 2277
This shows how different file types take up different amounts of space. If we look at the JPG image files which represented 20% of the total number of files barely make this chart! It is important to understand what you have both in terms of file count and space. No counting of boxes for digital materials! We now have to understand how many digital bytes a collection takes up and have a place to store them.
Working with digital materials brings with it a new set of rules and challenges of which the Libraries’ Electronic Records Task Force is addressing with the Archives and Special Collections units as well as across the Libraries. A main challenge is being able to preserve digital content – no longer can we catalog materials and stick them on a box and put the box on a shelf expecting the materials to be there when we want them. Digital content requires constant monitoring. The inventory is a first step and allows us to better understand the collections based on how much space they take up and their files sizes.
- You need to understand how much space files take up because you need to know how much space you will need to be able to preserve them long-term!
- You need to know the file types to know what software is needed to view the files – or make them available.
- You need to know the file types to decide how to care for the files long-term – or to preserve them.
The Electronic Records Task Force and the Digital Preservation Repositories Technology department are working to develop policies and procedures to manage, preserve, and provide access to these electronic materials. The inventory was just the first step of many.