top of page

Outline of Craig Ball's Electronic Discovery Workbook - Data Backups

Here's a continuation of my outline of the 2016 edition of Craig Ball's Electronic Discovery Workbook which I last posted about on January 13, 2018.

XV. Computer Back-up Systems A. General a. Movement of data to the cloud b. Growth in hard drive capacities. c. Increased use of virtual machines d. Use of replication – D2D2T (Disk-to-Disk-to-Tape) – disk staging. Backups stay on disk for a day to week before being copied to tape and deleted afterwards.

B. Back-up Tapes a. Full backups ignore software that can be reinstalled, only focus on user created data. b. Incremental backups – only focus on data created since last full or incremental backup. c. Tape is cheap, durable and portable, primarily used for disaster recovery. Should be recycled regularly – tape rotation. d. Legacy tapes may be retained indefinitely by many companies.

C. Duplication, Replication and Backup a. Duplication – copy made to another medium b. Replication – duplication without discretion. e.g., RAID 1 mirroring. c. Backup – alteration of data and logging of content with software that compresses and encrypts.

D. Back-up Systems a. Driving imaging – collection of bitstream in single file or chunks of data. b. Full backups / changed-file (since last full) backups. c. Incremental backups – based on status of a file’s bit. d. Differential backups – based on file’s created and modified times. e. Delta block – differences in version of file since last back-up. f. Back-up catalog – tracks source and metadata of each files. Can facilitate single instance backup of identical files. g. Tape log – list of backup events.

E. Back-up Media a. LTO-7 – 4 inch cartridges holding 6 TB transferring at 300 MB per second. i. Linear serpentine recording schemes. b. SAIT-2 tape systems – 8 mm tape with 800 GB of storage. Sony stopped selling in 2010. i. Helical recording system. c. eMag Solutions – specialists in back-up tapes estimate that it takes twice as long to restore data from back-up tape as stated capacity and transfer rates would suggest. For common tape data types theoretical data transfer time would be 1.5 to 3.5 hours, but real word time would be 4 to 7.5 hours. d. Disk backup intervals are currently on a par with tape rotation intervals. Tape is not as often used for disaster recovery, but usually only for long term storage. e. Virtual Tape Libraries – VTLs – disk arrays emulating tape drives so existing software and backup routines did not need to change.

F. Compression a. Use of computing power to express information in more compact ways. b. Saves time, tape and money needed for backups. c. Compression algorithms tend to be proprietary and require particular software.

G. Deduplication a. Duplication from one backup to the next is often was high as 90%. b. Vertical deduplication – deduping within a single custodian’s email archives and electronic files. c. Horizontal deduplication – across multiple custodians. d. In-line deduplication – hash value calculated for each file or data block. If already stored, not backed up. e. Post-process deduplication – all files stored on backup medium first, then culled.

H. Data Restoration a. Burden and cost of creating a restoration platform for backup data was the main reason why it was judged not reasonably accessible. b. New technology eliminated the need to recreate native computing environment to restore files. c. Non-native restoration – new technology eliminates the need to use particular backup software or recreate native computing environment. Can extract specific files from back-up sets. d. It can be cheaper to retrieve ESI from back-up tapes than from active data.

I. Sampling a. Selecting parts of tape most likely to contain responsive information and using them as a basis to decide whether or not restore more. b. Selection of data snapshots rather than a selection of tapes.

J. Cloud Backups a. Eliminates the need for user backups and occurs behind the scenes. b. The distinction between inaccessible backups and accessible active data stores will soon be just a historical curiosity,

bottom of page