Election Special : Yes, you can review 650,000 emails in eight days.
Donald Trump is many things, but he is certainly no certified electronic discovery specialist, and he definitely doesn't know how quickly an email review can be conducted. Since the FBI issued a statement saying that it did find any new evidence against Hillary Clinton in the email data collected from Congressman Weiner's laptop, Trump has been claiming that it was impossible for the Bureau to do the review so quickly. He said, "She is being protected by a rigged system. It's a totally rigged system. I've been saying it for a long time. You can't review 650,000 new emails in eight days." See the video here. Like so much else that comes out of Donald Trump's mouth this is complete nonsense, and the electronic discovery community is calling him out on it.
1. Most of the emails are likely not new, but are duplicates of emails that were earlier collected and reviewed from other data sets. Hashing, and comparison of email conversation IDs can easily determine which emails in the Weiner email archive are exact duplicates of others. There are almost certainly not 650,000 'new' emails.
2. Domain filtering can exclude a very high percentage of emails that come from sources clearly unrelated to Huma Abedin's work for the State Department. Like anyone she must have a lot of junk email and personal email that the FBI could simply exclude by filtering out domains used in email addresses in the author, recipients, and cc fields.
3. Keyword searching, as in any review, can quickly reduce the number of relevant emails to a smaller set.
Don't take my word for this. Take a look at what some of the top electronic discovery experts are saying. Joshua Neil Rubin is a partner at a New York law firm who has focused on electronic discovery since 1995. He posted on his blog ton November 2, 2016 that, "The FBI should and could easily have finished its investigation of Hillary Clinton’s emails on Anthony Weiner’s laptop by no later than early Monday morning. And, consistent with its current policy, it should have disclosed the result of its review by no later than noon on Monday. It should be easy for the FBI to eliminate almost all of the 650,000 emails using only metadata."
Wired Magazine spoke to a former FBI employee who confirms the electronic discovery capabilities of the Bureau, "One former FBI forensics expert even tells WIRED he’s personally assessed far larger collections of data, far faster. 'You can triage a dataset like this in a much shorter amount of time,' says the former agent, who asked to remain anonymous to avoid any political backlash. 'We’d routinely collect terabytes of data in a search. I’d know what was important before I left the guy’s house.'" See the article here.
Craig Ball is one of the world's most widely recognized electronic discovery experts. He has served as a court appointed special master and testified as an expert in numerous cases. In his blog post of October 30, 2016, he confirms that the Bureau ought to have been able to exclude a large number of emails by comparing hash values:
"The Bureau has already painstakingly vetted tens of thousands of Clinton e-mails, permitting the Justice Department to conclude that no crime had been committed or, as Mr. Comey put it on July 5, 'we cannot find a case that would support bringing criminal charges on these facts' and 'no reasonable prosecutor would bring such a case.' Assuming the Bureau had the same metavalues (like Message IDs) from the tens of thousands of messages they’ve had for months and which they have scrutinized with excruciating exactitude, why have they not made a hash-based comparison of the comparable components of the messages to assess how much is new and how much is yesterday’s news?" Ball goes on to fault the FBI for not doing the review before releasing information about this investigation that went nowhere. "After nearly four weeks with the devices, Mr. Comey might have waited for the results of a mechanized analysis that would typically take minutes against a single custodian’s locally-stored e-mail."
Yesterday, Melinda Levitt, a partner specializing in electronic discovery issues at Foley & Lardner LLP, commented on Ball's posting, "Well, given the news that broke this Sunday afternoon, maybe the FBI discovered that it too can use the types of e-discovery tools that all of us reading and commenting on this post have used for years.".
On November 1, 2016, the Association of Certified E-Discovery Specialists hosted a discussion entitled, "FORENSIC EXPERTS WEIGHT IN ON LAPTOP/EMAIL TOOLS", including Ball, Sharon Nelson (the President of Sensei Enterprises, Inc. - a digital forensics and information security firm and the author of The Electronic Evidence and Discovery Handbook: Forms, Checklists and Guidelines (American Bar Association, 2006)); John Simek, the Vice-President of Sensei; David Cowen, the President of the Cowen Group - the leading staffing agency for electronic discovery professionals; and an anonymous corporate forensics expert. The conversation was hosted by Mark Mack, the executive director of ACEDS, and the author of the Process of Illumination, an e-discovery guide discussed on this blog back on January 12, 2016.)
Cowen noted that, "The FBI CART [Computer Analysis Response Team]/RCFL[Regional Computer Forensics Laboratory] is still a FTK shop though and IEF [he's referring to this digital evidence tool] beyond webmail. The CART labs make use of a large FTK cluster known as LabNet for all their processing with the option to run a local instance. I know IEF has PST support now but I wouldn’t use it as my primary resource. The FBI and many law enforcement do make use of IEF but its mainly for internet evidence (browser based/p2p based applications) and webmail." While Simek and Ball note that the FTK email tool lost its edge years ago, and Nuix's software is superior, Cowen notes that FTK can help reveal deleted shadow copies. But there's no question that FTK is a sophisticated email review tool, even though it's not the most up-to-date version that the corporate world has come to prefer.
Back on October 31, 2016, the Wall Street Journal quoted experts and former FBI agents to confirm that domain filtering and keyword searching would be used to pare down the number emails to review:
"The first steps can happen quickly: categorizing the messages by fields, like 'sender' and 'subject line,' that would be obvious to a layman as well as by metadata that identifies which of the emails passed through Mrs. Clinton’s email server. Such indexing can be done within a few hours, experts say. Within that pool of emails, agents would likely then use document-review software to search for keywords like 'secret' or 'classified,' a process that takes a few seconds. Other searches could include words like 'Benghazi,' which were relevant in earlier email disclosures from Mrs. Clinton, former FBI agents say."
When you go to the polls tomorrow, don't let how you vote be influenced by the nonsense Trump and his supporters are spreading about how long it takes to review thousands of emails. The length of time the Bureau took for the review is entirely reasonable.