Introducing the new hh5 conda environment#
Background#
The hh5 python/conda environment has grown vastly in scope since it was first installed many years ago. What started as a place to put important python packages missing from NCI’s installations has since become an agile, community-driven, monolithic installation serving almost 600 researchers. Currently, there are 261 packages listed in the file used to construct and update the environment. Once all of the dependencies of these packages are factored in, there are 1,015 total python packages installed in the current analysis3-unstable
environment. As a result of the work put in by my predecessors at CLEX CMS, an update to the analysis3-unstable
environment, from a researcher requesting a package through cws_help@nci.org.au to integration and testing of that package, can be completed in under 1 hour.
That being said, there are issues that had to be addressed going forward.
The size of the environment;
At 1,015 packages, the current
analysis3-unstable
environment contains 289,524 files, directories and symlinks. At approximately 10GB in size, this corresponds to an average file size of around 36kB, which is not suitable for the lustre file systems that the conda environments are installed on.The lack of integration with NCI systems;
As conda is designed to create integrated environments on single-user systems, it installs packages like MPI and SSH, which are not configured correctly for NCI’s systems, causing issues such as MPI not working across multiple nodes, or ssh host-based authentication failing.
The new hh5 conda environment, herein referred to as conda_concept
is our attempt to address these issues while retaining the all the strengths of the current conda/analysis3
environments. It is a combination of container and squashfs-based solution, though use of a container is only a byproduct of Singularity’s ability to manage squashfs
filesystems. It combines ideas from Singularity Registry HPC, Rioux et al. PEARC ‘20: Practice and Experience in Advanced Research Computing, July 2020, Pages 72–76 and the current analysis3
environments.
In short#
Same packages as current
analysis3
environmentsAddresses issues with the installation and maintenance of current
analysis3
environment.Will be maintained in parallel with standard hh5 conda for at least 3 months.
Will retain
analysis3-stable/unstable
structure and (roughly) quarterly update timeNew module name -
conda_concept
- to use on Gadi e.g.
$ module use /g/data/hh5/public/modules
$ module load conda_concept/analysis3-unstable
$ python3
analysis3/22.07
,analysis3/22.10
andanalysis3/23.01
environments are available - they are exact duplicates of the corresponding existinganalysis3
environments.Some catches:
Can’t use direct path to interpreter, must use path to the ‘scripts’ directory instead if a full path is needed (e.g. in
#!/g/data/hh5/.../python3
shbang lines in scripts)conda activate
does not work, must load the module.
Any problems or questions, contact cws_help@nci.org.au, or leave an issue on the github page.
Usage#
Usage of the new conda_concept/analysis3
environments is very similar to the current conda/analysis3
environments. On the Gadi command line, or in PBS scripts, run
$ module use /g/data/hh5/public/modules
$ module load conda_concept/analysis3
The version naming scheme is not changing. The analysis3-unstable
environment will always be an alias for the most recent environment and will continue to be updated at roughly 3 monthly intervals. analysis3
will remain the most recent “stable” environment.
$ module load conda_concept
will load the “stable” environment. Specific versions can also be loaded with e.g.
$ module load conda_concept/analysis3-23.01
The typical analysis3
workflow comprising python scripts or jupyter notebooks should seamlessly carry over to the new conda_concept
environments. If your workflow does not, please contact cws_help@nci.org.au or leave an issue on the github page.
Key differences between conda_concept
and conda
#
The key difference of the approach used by the conda_concept
environments is the use of singularity’s ability to manage squashfs file systems. A squashfs can be thought of a something like a tar file that can be accessed as if it were an entirely new file system. The obvious advantage of placing a conda environment in a squashfs is that it reduces the file count of the entire environment on /g/data
to one.
The drawback of using singularity to manage the squashfs is that containers have a number of restrictions placed on them for security reasons, (e.g. the newgrp
, switchproj
and qcat
commands cannot be run from inside a container). Because of this, once the module is loaded, the user is kept out of the containerised environment unless a command that exists inside the container is run. This is accomplished by the use of a ‘launcher’ script that runs singularity
and executes the command from within the container. For more details on this, see the CMS Wiki page. This leads to known issues with the environment.
Known Issues#
The path to the python executable within a conda environment cannot be used as the shbang (e.g.
#!/g/data/hh5/public/apps/cms_conda/envs/analysis3/bin/python3
on the first line of script) as the executable does not exist outside of the container. Instead, use the launcher script symlink:#!/g/data/hh5/public/apps/cms_conda_scripts/analysis3.d/bin/python3
, which launches the container and runs the script from inside of it.conda env
commands do not work. This is because theconda
command runs outside of Singularity and does not have any visibility into the environments.conda activate
does not work. Loading the module is equivalent to runningconda activate
for a given environment, however,conda deactivate
andconda activate
to load a different environment after loading the module will not work.Very occasionally, a package may fail to import with
IOError
. This is an issue with the underlying file system. To aid in diagnosis, please submit the PBS jobid and/or node you were working on and the time at which the import failed.
If you encounter any problem not listed here that is fixed when you revert to the standard conda/analysis3-unstable
environment, please contact cws_help@nci.org.au, or leave an issue on the github page.