Personal Digital Archiving: Part Three – File Formats
This is part of a series of posts which is based on a 3-hour hands-on workshop I offer on this topic. Be sure and check out the preceding posts:
Personal Digital Archiving: An Overview
Personal Digital Archiving: Part One – Strategy
Personal Digital Archiving: Part Two – Storage Options
File Formats
When you’re thinking about how you want to store all of your information and content for future access, you’ll want to consider what file formats to save your assets in. Regarding this, there are a few issues to consider:
- Compressed vs. Raw File Formats
.jpg vs. .tif
For creating or saving an archival master copy, it is a best practice to save the highest quality, highest resolution, uncompressed version of the file at hand. For this reason many people choose to save image files in .tif or .psd (Photoshop) format which keeps layering and other information about the file available at the highest resolution. However it is also important to consider carefully before saving files in a proprietary format such as .psd as it necessitates access to a specific application in order to be accessed again. - Proprietary vs. Open File Formats
.doc vs. .odt
It is recommended to save in open, non-proprietary formats whenever possible granted the format has been widely adopted, because you do want to make sure that the file format will not become obsolete in the near future. Here are a few guidelines for choosing a file format for archival purposes:- The file format is currently widely adopted
- The file format has a history of backward compatibility
- The file format has good metadata support
- The PRONOM database provides detailed information on file formats including migration pathways from old file formats to new ones.
- Normalization
In archiving, this is the practice of converting and storing all files of the same type in the same file format. You may decide that this is a necessary first step for creating your personal digital archive, and will ensure that all of your information and data files are preserved and are available for future use.
Further Resources:
Cornell University Library. Digital Preservation Management Tutorial.
Up next: Developing Policy for your Personal Digital Archive

|

December 20th, 2011 at 9:12 am
Ellyssa,
I’m really enjoying this series, keep up the good work!
It may be useful for file format discussion to explain two things:
1. JPEG’s algorithm is lossy compression by design, whereas other forms of compression do not result in loss of data. Putting a bunch of TIFF files into a gzipped tarball would save space over uncompressed storage, but would not result in a loss of data, for instance.
2. DOC is a proprietary format, but is subject to the Microsoft Open Specification Promise. Since it’s a much more widely-implemented format, it might be a better choice for future access.
December 28th, 2011 at 2:50 pm
Wonderful Brad, thanks so much for the contribution!!