How many files in a GB?

John Tredinnick is a partner at a law firm and a well known electronic discovery expert. Recently he posted the results of a study to his blog which is the kind of rough thumbnail sketch of ESI that every paralegal and litigation support analyst wants to have at his or her fingertips. See the posting here. Tredinnick, who works at Catalyst has used their large data sets to review the average number of electronic files in a gigabyte. He first conducted this study in 2011, and determined that the average number of files in data sets of that era was about 2500. In the 2014 and 2015 studies the totals increased to approximately 3500 - 4500 per GB. Roughly the breakdowns per file type were:

5000 - .msg files per GB

3500 - 4500 - .doc/docx files per GB

1500 - 2500- .xls/xlsx files per GB

2000 - .pdfs files per GB

500 - .ppt/.pptx files per GB

See the precise figures in the chart for Tredinnick's 2014 data set, and the figures for the 2015 data set in the bar graph - these images have been taken from the blog posting.

An important caveat is that .eml files (an email format) can throw off the overall per GB total - Catalyst collected very large numbers of these data sets from dozens of different cases and there can be as many as 30,000 per GB.

In any event, it's easy to keep these rounded off estimates for the MS Offices files in your head, and when quizzed by an attorney you should be able to impress them with accuracy of your rough estimate.

Contact Me With Your Litigation Support Questions:

  • Twitter Long Shadow

© 2015 by Sean O'Shea . Proudly created with