Semi-structured data

June 18, 2020

Structured is data that is contained in a field for a particular record - such as data in a spreadsheet or a relational database.  Unstructured data.  Most data is unstructured - electronic files, metadata, web pages. 


Semi-structured data is data that is not organized in a relational database, but which contains tags that separate elements of the data.   An object database uses semi-structured data.  It does not use tables, but instead creates relationships between two data sets directly.  Many-to-many relations are established, rather than one-to-many as in relational databases.   Object databases work better with complex data. 


JSON, email, electronic data interchange (EDI) files, and .xml files are examples of semi-structured files.  A .xml file for the contents of a Word document uses hundreds of tags for character formatting, footnotes, etc. - which are nested together in different levels.   A .docx file consists of XML files inside a ZIP archive.





