Finding near duplicates in Viewpoint
Elite Discovery's Viewpoint document review platform can assist with near duplicate detection. It uses a separate document viewer especially to review 'ND' similarity. As shown below, documents are organized into separate groups listed in the top left pane, and the individual documents for selected groups are shown in the pane to the right.
The bottom pane displays selected documents side by side in a 'difference viewer'. The top document in the list will be the 'base' and a document selected below will be shown to the right of it for comparison.
The options to highlight differences or remove whitespace can be toggled on and off. Documents in the same family as the selected document are shown in the pane at the bottom left. The document list allows you to easily find similar documents which have been tagged for responsiveness or privilege inconsistently.
An admin creates ND groups in Viewpoint's View Manager, which is available in the Tools menu, and can be used to create views based on several different criteria. A wizard is used to build near duplicates for the complete project. The default threshold is 95, but this can be adjusted to generate groups of higher or lower similarity.
View Manager will also give you the option to exclude documents which are missing hash values or certain key metadata fields. Repeated content such as confidentiality footers can be excluded from the comparison by checking the option to 'Apply/Define text to be ignored when building ND', and adding different blocks of text.