SharePoint Syntex

Office 365 includes Microsoft SharePoint Syntex, a service which automatically extracts information for the contents of documents uploaded to SharePoint.

These are the steps to follow to use Syntex to extract the dates from a set of documents uploaded to SharePoint. It's important to be able to quickly access the information contained in the large sets of documents that firms manage. Syntex cannot only automatically classify different types of documents, but it can also organize key data points from these documents and import them into a column oriented database. Syntex can process form-like documents in which information appears repeatedly at a particular point, or work with unstructured documents.

Syntex contains extractor functions which can be automated to pull particular data points. Click on the option to train the extractor:

Review a few of the documents in your set, and select the content from the text that you want to extract which appears in more or less the same way in each document.

You should also locate outliers in the document set which do not have the value you want to extract. For these, check off the box which reads, 'No label present'.

Syntex should prompt you to train the extractor when a sufficient number of documents have been selected.

The next step is to create an explanation for the extractor.

The explanation library contains patterns which can be used for various values that you may want to extract.

After the explanation is saved, Syntex will try to find values for each document. It will flag documents for which no value is found as mismatches. You can select the value for some of these documents to retrain the extractor.

The extractor can be modified to find values which appear before a particular prefix.

Test the model again after retraining, and it should be more successful in pulling the values you are focusing on from each document.