How to stop temporary files filling up your /home
directory on Gadi#
Background#
Many applications, often those designed around a desktop environment, tend to place their caches and other temporary data in a users /home
directory. Unfortunately, on Gadi, this is the last place it should go. No only is /home
quota limited to 10GB, but /home
is also the slowest global filesystem on Gadi. The use of a users /home
directory as temporary space in desktop-oriented applications is generally motivated by the assumption that /home
is on the largest partition of the system. Furthermore, $TMPDIR
is not used as it is assumed that it will be cleared on every reboot, and desktops tend to have irregular reboot cycles. All of these assumptions are false on Gadi. Applications that use the /home
directory for temporary storage tend not to allow users to configure this, so a bit more work is needed to move this temporary storage away from /home
on Gadi. The conventional way for doing this is to create a symlink in place of the directories containing temporary files within your /home
directory. The following sections will detail how to find those directories, and how to move them to /scratch
.
Finding large directories on /home
#
To check your /home
quota on Gadi, run the following command:
$ quota -s
Disk quotas for user dr4292 (uid 19147):
Filesystem space quota limit grace files quota limit grace
gadi-home-fas.gadi.nci.org.au:/home
4067M 10240M 10240M 87809 4295m 4295m
This post will use my /home
directory as an example. To get a breakdown of the size of every file and directory within the top-level of your /home
directory, run the following command:
$ du -csh $( ls -A )
4.0K .ICEauthority
40K .Xauthority
4.0K .ash_history
8.0K .astropy
...
31M test_venv
344K um_output
7.1M umui_runs
0 v45_gdata
4.1G total
The above options for du
cause it to give the total size for each argument (-s
, i.e. show totals for directories, do not show their contents), create a total
entry for the sum of the sizes of all arguments (-c
) and use human-readable output (-h
). The $( ls -A )
as the final argument for du
tells the shell to run the ls -A
command and use its output as the remainder of the arguments to du
.
Note
It is important to use the output of ls -A
in du
, as running du -csh *
will not show hidden files or directories (those starting with .
) by default.
If you have a lot of files in your home directory, it may be useful to pipe the output of du
to sort -h
, which is able to parse the human-readable sizes from du
:
$ du -csh $( ls -A ) | sort -h
0 .pbs_qmgr_history
0 v45_gdata
4.0K .ICEauthority
4.0K .ash_history
...
386M .singularity
807M .cache
904M mdss_test_dir
1.3G cylc-run
4.1G total
The above shows the importance of listing hidden files, as over a quarter of my /home
usage is in hidden directories. Based on this, the .cache
directory is a good candidate for moving to /scratch
. It takes up a significant portion of my /home
usage, and as it is a cache, it is unlikely to cause applications to fail if its contents are expired.
Moving directories to different filesystems without changing their path#
To migrate the .cache
directory to /scratch
, I will run the following (note, substitute your project and username in place of mine):
$ cp -a .cache /scratch/v45/dr4292/tmp
$ rm -rf .cache
$ ln -s /scratch/v45/dr4292/tmp/.cache
Note
It is best to run this while you have no PBS jobs or any other background processes running, as there is a chance that a cache directory could be recreated in the time between the rm
command and ln
command.
This command makes a copy of .cache
in your /scratch
directory (every NCI user has a tmp
directory created in their /scratch/$PROJECT/$USER
directory for their default login project). The .cache
directory is then removed from /home
and a symlink is put in its place with the same name.
Note
Use cp -a
to preserve all permissions on the directory and its contents being copied. In general, /scratch
directories are configured to be more permissive than /home
directories. Using cp -a
ensures that any more restrictive permissions inherited from /home
are retained when the directory is copied
This means that every application that attempts to write to or read from /home/563/dr4292/.cache
will actually be accessing /scratch/v45/dr4292/tmp/.cache
(Note the trailing /
in the second ls
command to list the directory contents instead of symlink info):
$ ls -l .cache
lrwxrwxrwx 1 dr4292 v45 30 Jan 27 14:59 .cache -> /scratch/v45/dr4292/tmp/.cache
$ ls -l .cache/
total 32
drwxr-sr-x 3 dr4292 v45 4096 Nov 1 10:46 conda
drwxr-sr-x 2 dr4292 v45 4096 Dec 1 16:12 fontconfig
drwxr-sr-x 3 dr4292 v45 4096 Nov 8 11:41 jedi
drwxr-sr-x 2 dr4292 v45 4096 Oct 13 14:39 matplotlib
drwxr-sr-x 54 dr4292 v45 4096 Jan 25 17:27 numba
drwx--S--- 5 dr4292 v45 4096 Oct 31 17:08 pip
drwxr-sr-x 3 dr4292 v45 4096 Nov 9 13:42 scikit-image
drwxr-sr-x 3 dr4292 v45 4096 Nov 3 13:02 yarn
After this, my /home
quota and disk usage looks like this:
$ quota -s
Disk quotas for user dr4292 (uid 19147):
Filesystem space quota limit grace files quota limit grace
gadi-home-fas.gadi.nci.org.au:/home
3299M 10240M 10240M 32051 4295m 4295m
$ du -csh $( ls -A ) | sort -h
0 .cache
0 .pbs_qmgr_history
0 v45_gdata
4.0K .ICEauthority
...
386M .singularity
904M mdss_test_dir
1.3G cylc-run
3.3G total
/home
quota and ARE#
If your /home
quota is filled up for any reason, ARE jobs and jupyter notebooks will not be able to start, and will show the following error:
"Disk quota exceeded @ dir_s_mkdir - /home/.../ondemand/data/sys/dashboard/batch_connect/sys/jupyter/ncigadi/output/b8f7d971-9b27-4701-b97a-d418e5d2f0d8"
This is because the ondemand
directory where ARE places the temporary files it uses to run jobs is always placed in a users /home
directory. Unfortunately, it is not possible to use this method to move the ondemand
directory, as the ARE head-node does not reside on Gadi, and therefore does not have access to /scratch
or /g/data
. The only way to restore access to ARE jobs is to clean other files and directories out of your /home
directory.