ImageTank Reference Manual
Print

Why Snapshots?

Read What are Snapshots? first.

Snapshots are meant to address a few a number of pain points.

  • The desktop metaphor gets really stressed when you have a lot of data sets.
  • Time machine backups and Dropbox like functionality does not handle large datasets well, particularly not when you move data between folders, rename large files etc.
  • It is really tricky to share larger data files. You end up sending around Dropbox links or e-mailing large objects back and forth.
  • You end up storing data on off-site locations and universities and research organizations have to turn a blind eye to this.
  • Classical file servers are not a solution for sharing data outside a local network, and with more and more people working from home or on the road this becomes a major problem.
  • Reproducibility is hard when file names change or files get lost because they didn’t fit on your machine and you put them on that ‘black harddrive’.
  • File names are very short, not descriptive and even if you have that awesome ‘readme.txt’ file that explains everything it is hard to search them.

The way that Snapshots address this is as follows

  • Based on a self documenting file format that supports a very flexible data model.
  • Views datasets as immutable (read only) but adds a metadata layer that can be changed.
  • The metadata is much richer than the standard meta data in a file system (typically just the file name, creation date, modification date and size).
  • Data is stored on a server that you control. You can use a S3 compatible service provider (google for a full list, and Backblaze works reasonably well) or run it on your own local network (typically contact your local IT).
  • Visual Data Tools (VDT) does run the discovery server. Data IDs and server names for private servers are encoded so that the VDT server has no way to see what data you have or what the name of the private server is.
  • By using the Cloud storage you can remove files from your local drive and any existing scripts will still work (just download the data as needed).

On This Page