Cluster: What is backed up (and not) on the Duke Compute Cluster storage?


Cluster for analysis, not storage

The Duke Compute Cluster is a data analysis tool, and data storage resources are an essential part (not the purpose) of the installation.

It is good to note that the cluster is primarily for data analysis and is not designed for data storage. Most of the data storage capacity is a shared resource accessible through the /work directory. Home directories for users are best used for scripts, software, small data sets, and results.

Only users’ home directories are “backed up.” The /work directory is not backed up and should be considered as “scratch” data space that can be vacated without notice.

 Backed up NOT backed up 
 home directories (/hpchome) /work directory

Data deleted from the /work directory cannot be recovered.

Data retention — details on the backups

Files in home directories on the Duke Compute Cluster are backed up daily, so that there are up to six versions of a file. These incremental versions can be retained for up to 90 days. If you change a file more than six times in a 90-day period, be aware that only the six most-recent versions are kept in backup.

Deleted files are retained in backups for 180 days.

In some cases, backup schedules have to be made less frequent because the size of directories or the number of files is so large that backup processes cannot be completed in a 24-hour period. In this case, backup schedules are lengthened, so that backup processes are completed before the next backup is initiated.

In order to have files restored from the backup system, contact rescomputing@duke.edu for assistance.

The home directories are backed up by OIT’s storage team, using TSM. See the TSM FAQ for details on this system.

Do not retain irreplaceable data on the cluster.

Users of the cluster should retain a copy of their irreplaceable data at a separate location, and they should remove results from the system as soon as they can. Temporary and ephemeral data sets that are not essential should be deleted from cluster storage so that other users can use the capacity.

Remember that the storage on the Duke Compute Cluster is not to be used to store sensitive information. If your research uses sensitive information, please contact the IT Security Office or Duke Research Computing for assistance in lining up resources with adequate protection. Other IT resources are available that may meet your requirements. 

Contact rescomputing@duke.edu for assistance in retrieving data from backups.