Back on March 30, 2017, I participated in a webcast hosted by Exterro, entitled, "Office 365 & Your E-Discovery Process", which can be accessed here. The webcast was conducted by Mike Hamilton of Exterro; John Collins, Director of Information Governance for DTI, and Nishad Shevde, Exterro's Director of Strategic Operations. The presentation contained a lot of detailed information about what Microsoft's group of software and services can do in the area of electronic discovery. I recommend you watch it.
The below chart shows which Office 365 features have built-in electronic discovery tools.
PST archives larger than 10 GB cannot be exported. Only 10,000 mailboxes may be searched and placed on hold in a single eDiscovery search, and just 2 eDiscovery searches can be run simultaneously.
Image files such as TIFFs, and non-searchable PDFs can't be indexed, and neither can Excel files over a certain size. The presentation didn't indicate a precise limit, but this posting by Microsoft indicates that it's only 4 MB. Microsoft's 'Unindexed items in Content Search in Office 365' manual also states that no attachments larger than 32 MB will be indexed; only the first 10 attachments to an email will be indexed; attachments to emails attached to other 'parent' emails, are not indexed, but the 'child' emails are; and no more than 2 million characters will be indexed from any one document.
After the webcast, I did some further research on the MS Technet site. Microsoft provides detailed instructions on how to export .pst files in the Office 365 environment here. The compliance management section of MS Exchange includes an In-place eDiscovery & hold tool.
The eDiscovery PST export tool gives you the option to exclude duplicate messages, and include unsearchable attachments.