Python Tuesday: NetCDF Python library overview#
Scott Wales, CLEX CMS
Let’s take a look at some of the libraries available in the CMS Conda environment for loading NetCDF files.
There are three main libraries available - xarray
, netCDF4
and iris
. Each let you load a file and work with variables as if they were a numpy array, but each have their own unique features that can be helpful when working with climate datasets.
For the examples I’ll be using the following dataset from NCI’s CMIP5 archive:
sampledata = 'http://dapds00.nci.org.au/thredds/dodsC/rr3/CMIP5/output1/CSIRO-BOM/ACCESS1-0/amip/mon/atmos/Amon/r1i1p1/latest/tas/tas_Amon_ACCESS1-0_amip_r1i1p1_197901-200812.nc'
Xarray#
http://xarray.pydata.org/en/stable/
Xarray is my favourite library for working with NetCDF files - it makes it easy to filter data by coordinate value, rather than having to work out array indices yourself. In combination with the Dask library it also lets you work with very large datasets without having to load everything into memory all at once.
Xarray works with file formats other than NetCDF as well, so some features like compression settings can be inconvenient to set.
import xarray
# Open a file
data = xarray.open_dataset(sampledata)
# Variables can be accessed either as properties or as a dict
surface_temperature = data.tas
surface_temperature = data['tas']
print("Variable:\n", surface_temperature)
# Same for attributes
units = surface_temperature.units
units = surface_temperature.attrs['units']
print()
print("Attribute:\n", units)
# Variables can be indexed numpy-style or pandas-style
d = surface_temperature[0, 0:10, 0:10]
d = surface_temperature.isel(time=0, lat=slice(0,10), lon=slice(0,10))
d = surface_temperature.sel(time='19790116T1200', lat=slice(-90,-80), lon=slice(0,20))
# Data can be saved to a new file easily
data.to_netcdf('data.nc')
Variable:
<xarray.DataArray 'tas' (time: 360, lat: 145, lon: 192)>
[10022400 values with dtype=float32]
Coordinates:
* time (time) datetime64[ns] 1979-01-16T12:00:00 1979-02-15 ...
* lat (lat) float64 -90.0 -88.75 -87.5 -86.25 -85.0 -83.75 -82.5 ...
* lon (lon) float64 0.0 1.875 3.75 5.625 7.5 9.375 11.25 13.12 15.0 ...
height float64 ...
Attributes:
standard_name: air_temperature
long_name: Near-Surface Air Temperature
units: K
cell_methods: time: mean
cell_measures: area: areacella
history: 2012-02-17T05:21:51Z altered by CMOR: Treated scalar d...
associated_files: baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation...
Attribute:
K
netCDF4#
http://unidata.github.io/netcdf4-python/
The netCDF4 library is a bare-bones library for working with NetCDF data. It doesn’t have the bells and whistles of Xarray, but unlike Xarray it’s a dedicated library, so features like compression and scale-and-offest are simpler to access.
import netCDF4
data = netCDF4.Dataset(sampledata)
# Variables can be accessed like a dict
surface_temperature = data['tas']
surface_temperature = data.variables['tas']
print("Variable:\n", surface_temperature)
# Attributes are accessed as properties of a variable
units = surface_temperature.units
print("Attribute:\n", units)
# Variables can be indexed numpy-style
data = surface_temperature[0, 0:10, 0:10]
# Data can't be copied to a new file easily
Variable:
<class 'netCDF4._netCDF4.Variable'>
float32 tas(time, lat, lon)
standard_name: air_temperature
long_name: Near-Surface Air Temperature
units: K
cell_methods: time: mean
cell_measures: area: areacella
history: 2012-02-17T05:21:51Z altered by CMOR: Treated scalar dimension: 'height'. 2012-02-17T05:21:51Z altered by CMOR: replaced missing value flag (-1.07374e+09) with standard missing value (1e+20).
coordinates: height
missing_value: 1e+20
_FillValue: 1e+20
associated_files: baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_ACCESS1-0_amip_r0i0p0.nc areacella: areacella_fx_ACCESS1-0_amip_r0i0p0.nc
unlimited dimensions: time
current shape = (360, 145, 192)
filling off
Attribute:
K
Iris#
While Xarray and netCDF4 both work similarly, the Iris library works a bit differently. Rather than accessing variables like a dictionary, Iris uses a list with a special function to get a variable by name. It also prefers using CF standard names, some special trickery is requried to get the variable by its name in the file.
Iris also keeps the file-level attributes with each of the variables - you can see below that it lists things like the title and metadata conventions
import iris
data = iris.load(sampledata)
# Variables can be accessed like a list
surface_temperature = data[0]
# Iris prefers to use the standard_name to identify variables
surface_temperature = data.extract_strict('air_temperature')
# Getting variables by their own name can be done, but is complicated
surface_temperature = data.extract_strict(iris.Constraint(cube_func = lambda c: c.var_name == 'tas'))
print("Variable:\n", surface_temperature)
# Attributes can be accessed as properties
units = surface_temperature.units
print()
print("Attribute:\n", units)
# Variables can be indexed numpy-style or by special constraint objects
data = surface_temperature[0, 0:10, 0:10]
data = surface_temperature.extract(iris.Constraint(latitude=lambda x: 0 < x < 20))
# Data can be saved to a new file
iris.save(data, 'data.nc')
/local/swales/conda/analysis3/lib/python3.6/site-packages/iris/fileformats/cf.py:798: UserWarning: Missing CF-netCDF measure variable 'areacella', referenced by netCDF variable 'tas'
warnings.warn(message % (variable_name, nc_var_name))
/local/swales/conda/analysis3/lib/python3.6/site-packages/iris/fileformats/_pyke_rules/compiled_krb/fc_rules_cf_fc.py:1813: FutureWarning: Conversion of the second argument of issubdtype from `str` to `str` is deprecated. In future, it will be treated as `np.str_ == np.dtype(str).type`.
if np.issubdtype(cf_var.dtype, np.str):
/local/swales/conda/analysis3/lib/python3.6/site-packages/iris/fileformats/_pyke_rules/compiled_krb/fc_rules_cf_fc.py:1813: FutureWarning: Conversion of the second argument of issubdtype from `str` to `str` is deprecated. In future, it will be treated as `np.str_ == np.dtype(str).type`.
if np.issubdtype(cf_var.dtype, np.str):
Variable:
air_temperature / (K) (time: 360; latitude: 145; longitude: 192)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Scalar coordinates:
height: 1.5 m
Attributes:
Conventions: CF-1.4
DODS_EXTRA.Unlimited_Dimension: time
associated_files: baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_ACCESS1-0_amip_r0i0p0.nc...
branch_time: 0.0
cmor_version: 2.8.0
contact: The ACCESS wiki: http://wiki.csiro.au/confluence/display/ACCESS/Home. Contact...
creation_date: 2012-02-17T05:21:53Z
experiment: AMIP
experiment_id: amip
forcing: GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFC11, CFC12, CFC113,...
frequency: mon
history: 2012-02-17T05:21:51Z altered by CMOR: Treated scalar dimension: 'height'....
initialization_method: 1
institute_id: CSIRO-BOM
institution: CSIRO (Commonwealth Scientific and Industrial Research Organisation, Australia),...
model_id: ACCESS1-0
modeling_realm: atmos
parent_experiment: N/A
parent_experiment_id: N/A
parent_experiment_rip: r1i1p1
physics_version: 1
product: output
project_id: CMIP5
realization: 1
references: See http://wiki.csiro.au/confluence/display/ACCESS/ACCESS+Publications
source: ACCESS1-0 2011. Atmosphere: AGCM v1.0 (N96 grid-point, 1.875 degrees EW...
table_id: Table Amon (01 February 2012) 01388cb4507c2f05326b711b09604e7e
title: ACCESS1-0 model output prepared for CMIP5 AMIP
tracking_id: 7cfe11fc-5b1c-457d-812b-e95f45e7def4
version_number: v20120115
Cell methods:
mean: time
Attribute:
K
/local/swales/conda/analysis3/lib/python3.6/site-packages/iris/fileformats/netcdf.py:1573: FutureWarning: Conversion of the second argument of issubdtype from `str` to `str` is deprecated. In future, it will be treated as `np.str_ == np.dtype(str).type`.
if np.issubdtype(coord.points.dtype, np.str):