EDRM Data Calculator
The EDRM has a data calculator (set up in an Excel spreadsheet) posted to its site here. The data calculator can be used to determine how much a data set will be reduced by standard processing steps, and how much of it will be taken up by common file types.
In works by inputting a total data set size in GB in Step 1. You then enter percentages that de-NISTing, de-duplication, search terms, and TAR are expected to reduce the data set. Next enter expected percentages for structured and unstructured data file types to see how much of the set will be compromised of Word files, Excel spreadsheets and so forth.
