Converting between coordinate systems with Pyproj#
Scott Wales, CLEX CMS
Pyproj is a helpful tool to use when you want to change your data’s coordinate system
import xarray
import pyproj
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import numpy
Sample data in polar projection#
The sample dataset here is MEaSUREs BedMachine Antarctica, Version 2
data = xarray.open_dataset('/g/data/v45/pas561/bedmachineant/BedMachineAntarctica_2020-07-15_v02.nc', chunks={'x': 5000, 'y': 5000})
The dataset is quite large, 13333 by 13333 points. To make it easier to work with I’ll subset the data to every 100th row and column
subset = data.isel(x=slice(None, None, 100), y=slice(None, None, 100))
subset.surface.plot()
<matplotlib.collections.QuadMesh at 0x7fa9fc5aba00>
As you can see the dataset is in a polar projection. Rather than having latitude and longitude coordinates it has ‘x’ and ‘y’ values, both in units of metres.
subset.x[:5]
<xarray.DataArray 'x' (x: 5)> array([-3333000, -3283000, -3233000, -3183000, -3133000], dtype=int32) Coordinates: * x (x) int32 -3333000 -3283000 -3233000 -3183000 -3133000 Attributes: long_name: Cartesian x-coordinate standard_name: projection_x_coordinate units: meter
- x: 5
- -3333000 -3283000 -3233000 -3183000 -3133000
array([-3333000, -3283000, -3233000, -3183000, -3133000], dtype=int32)
- x(x)int32-3333000 -3283000 ... -3133000
- long_name :
- Cartesian x-coordinate
- standard_name :
- projection_x_coordinate
- units :
- meter
array([-3333000, -3283000, -3233000, -3183000, -3133000], dtype=int32)
- long_name :
- Cartesian x-coordinate
- standard_name :
- projection_x_coordinate
- units :
- meter
There is information about the projection in the file metadata. The data is in a ‘Polar Stereographic South’ projection.
Also note the ‘proj4’ attribute - proj4 is a standard tool for working with cartographic projections. This is saying that the proj4 id of the projection is ‘epsg:3031’ (EPSG codes are standard names for different projections).
subset.attrs
{'Conventions': 'CF-1.7',
'Title': 'BedMachine Antarctica',
'Author': 'Mathieu Morlighem',
'version': '15-Jul-2020 (v2.0)',
'nx': 13333.0,
'ny': 13333.0,
'Projection': 'Polar Stereographic South (71S,0E)',
'proj4': '+init=epsg:3031',
'sea_water_density (kg m-3)': 1027.0,
'ice_density (kg m-3)': 917.0,
'xmin': -3333000,
'ymax': 3333000,
'spacing': 500,
'no_data': -9999.0,
'license': 'No restrictions on access or use',
'Data_citation': 'Morlighem M. et al., (2019), Deep glacial troughs and stabilizing ridges unveiled beneath the margins of the Antarctic ice sheet, Nature Geoscience (accepted)',
'Notes': 'Data processed at the Department of Earth System Science, University of California, Irvine'}
Converting coordinates with Pyproj#
‘pyproj’ is a Python interface to proj4. We can use it to convert between different coordinate systems. The EPSG code for basic lat-lon coordinates is ‘epsg:4326’.
To convert between coordinate systems you create a ‘Transformer’, then ‘transform’ the coordinate values.
source_crs = 'epsg:3031' # Coordinate system of the file
target_crs = 'epsg:4326' # Global lat-lon coordinate system
polar_to_latlon = pyproj.Transformer.from_crs(source_crs, target_crs)
If you just give the 1d coordinates from the file you see a somewhat weird pattern
lat, lon = polar_to_latlon.transform(subset.x, subset.y)
plt.plot(lon, lat)
[<matplotlib.lines.Line2D at 0x7fa9fc42cdc0>]
What’s happening is clearer on a polar projection - you can see it’s drawn a diagonal line. This is because pyproj pairs up the x and y values, rather than creating a grid
ax = plt.axes(projection=ccrs.SouthPolarStereo())
ax.plot(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fa9fc3b5d60>
Since the latitude and longitude values are going to be 2d coordinates you need to use ‘numpy.meshgrid’ to fill out the axes before doing the conversion
X, Y = numpy.meshgrid(subset.x, subset.y)
lat, lon = polar_to_latlon.transform(X, Y)
plt.pcolormesh(lon)
<matplotlib.collections.QuadMesh at 0x7fa9fc0d3640>
You can add the coordinates to the dataarray by adding to the .coords
dictionary. This takes a tuple of (dimensions, data).
subset.coords['lat'] = (subset.surface.dims, lat)
subset.coords['lon'] = (subset.surface.dims, lon)
subset.surface
<xarray.DataArray 'surface' (y: 134, x: 134)> dask.array<getitem, shape=(134, 134), dtype=float32, chunksize=(50, 50), chunktype=numpy.ndarray> Coordinates: * x (x) int32 -3333000 -3283000 -3233000 ... 3217000 3267000 3317000 * y (y) int32 3333000 3283000 3233000 ... -3217000 -3267000 -3317000 lat (y, x) float64 -48.46 -48.75 -49.03 -49.31 ... -49.22 -48.93 -48.65 lon (y, x) float64 -45.0 -44.57 -44.13 -43.68 ... 135.9 135.4 135.0 Attributes: long_name: ice surface elevation standard_name: surface_altitude units: meters grid_mapping: mapping source: REMA (Byrd Polar and Climate Research Center and the Pola...
- y: 134
- x: 134
- dask.array<chunksize=(50, 50), meta=np.ndarray>
Array Chunk Bytes 70.14 kiB 9.77 kiB Shape (134, 134) (50, 50) Count 19 Tasks 9 Chunks Type float32 numpy.ndarray - x(x)int32-3333000 -3283000 ... 3317000
- long_name :
- Cartesian x-coordinate
- standard_name :
- projection_x_coordinate
- units :
- meter
array([-3333000, -3283000, -3233000, -3183000, -3133000, -3083000, -3033000, -2983000, -2933000, -2883000, -2833000, -2783000, -2733000, -2683000, -2633000, -2583000, -2533000, -2483000, -2433000, -2383000, -2333000, -2283000, -2233000, -2183000, -2133000, -2083000, -2033000, -1983000, -1933000, -1883000, -1833000, -1783000, -1733000, -1683000, -1633000, -1583000, -1533000, -1483000, -1433000, -1383000, -1333000, -1283000, -1233000, -1183000, -1133000, -1083000, -1033000, -983000, -933000, -883000, -833000, -783000, -733000, -683000, -633000, -583000, -533000, -483000, -433000, -383000, -333000, -283000, -233000, -183000, -133000, -83000, -33000, 17000, 67000, 117000, 167000, 217000, 267000, 317000, 367000, 417000, 467000, 517000, 567000, 617000, 667000, 717000, 767000, 817000, 867000, 917000, 967000, 1017000, 1067000, 1117000, 1167000, 1217000, 1267000, 1317000, 1367000, 1417000, 1467000, 1517000, 1567000, 1617000, 1667000, 1717000, 1767000, 1817000, 1867000, 1917000, 1967000, 2017000, 2067000, 2117000, 2167000, 2217000, 2267000, 2317000, 2367000, 2417000, 2467000, 2517000, 2567000, 2617000, 2667000, 2717000, 2767000, 2817000, 2867000, 2917000, 2967000, 3017000, 3067000, 3117000, 3167000, 3217000, 3267000, 3317000], dtype=int32)
- y(y)int323333000 3283000 ... -3317000
- long_name :
- Cartesian y-coordinate
- standard_name :
- projection_y_coordinate
- units :
- meter
array([ 3333000, 3283000, 3233000, 3183000, 3133000, 3083000, 3033000, 2983000, 2933000, 2883000, 2833000, 2783000, 2733000, 2683000, 2633000, 2583000, 2533000, 2483000, 2433000, 2383000, 2333000, 2283000, 2233000, 2183000, 2133000, 2083000, 2033000, 1983000, 1933000, 1883000, 1833000, 1783000, 1733000, 1683000, 1633000, 1583000, 1533000, 1483000, 1433000, 1383000, 1333000, 1283000, 1233000, 1183000, 1133000, 1083000, 1033000, 983000, 933000, 883000, 833000, 783000, 733000, 683000, 633000, 583000, 533000, 483000, 433000, 383000, 333000, 283000, 233000, 183000, 133000, 83000, 33000, -17000, -67000, -117000, -167000, -217000, -267000, -317000, -367000, -417000, -467000, -517000, -567000, -617000, -667000, -717000, -767000, -817000, -867000, -917000, -967000, -1017000, -1067000, -1117000, -1167000, -1217000, -1267000, -1317000, -1367000, -1417000, -1467000, -1517000, -1567000, -1617000, -1667000, -1717000, -1767000, -1817000, -1867000, -1917000, -1967000, -2017000, -2067000, -2117000, -2167000, -2217000, -2267000, -2317000, -2367000, -2417000, -2467000, -2517000, -2567000, -2617000, -2667000, -2717000, -2767000, -2817000, -2867000, -2917000, -2967000, -3017000, -3067000, -3117000, -3167000, -3217000, -3267000, -3317000], dtype=int32)
- lat(y, x)float64-48.46 -48.75 ... -48.93 -48.65
array([[-48.46438311, -48.74937612, -49.03268724, ..., -49.12298115, -48.84022171, -48.55576001], [-48.74937612, -49.03705577, -49.32307144, ..., -49.4142345 , -49.12876489, -48.84161077], [-49.03268724, -49.32307144, -49.61181062, ..., -49.70384913, -49.41565005, -49.12578525], ..., [-49.12298115, -49.4142345 , -49.70384913, ..., -49.79616909, -49.50709258, -49.21635659], [-48.84022171, -49.12876489, -49.41565005, ..., -49.50709258, -49.22075161, -48.93273203], [-48.55576001, -48.84161077, -49.12578525, ..., -49.21635659, -48.93273203, -48.64741077]])
- lon(y, x)float64-45.0 -44.57 -44.13 ... 135.4 135.0
array([[ -45. , -44.56699903, -44.12745431, ..., 43.98540218, 44.4270613 , 44.86214576], [ -45.43300097, -45. , -44.56035409, ..., 44.41824746, 44.86004112, 45.29515747], [ -45.87254569, -45.43964591, -45. , ..., 44.85787123, 45.29969827, 45.73474529], ..., [-133.98540218, -134.41824746, -134.85787123, ..., 135. , 134.55818442, 134.12318172], [-134.4270613 , -134.86004112, -135.29969827, ..., 135.44181558, 135. , 134.56489461], [-134.86214576, -135.29515747, -135.73474529, ..., 135.87681828, 135.43510539, 135. ]])
- long_name :
- ice surface elevation
- standard_name :
- surface_altitude
- units :
- meters
- grid_mapping :
- mapping
- source :
- REMA (Byrd Polar and Climate Research Center and the Polar Geospatial Center)
Plotting with the new coordinates#
Plotting the data shows it’s the right coordinates, but the image is very messy
plt.figure(figsize=(15,3))
subset.surface.plot.pcolormesh('lon', 'lat', add_colorbar=False)
<matplotlib.collections.QuadMesh at 0x7fa9fc17fd00>
This is because the data wraps around, using a cartopy lat-lon projection on the plot cleans the image up.
plt.figure(figsize=(15,3))
ax = plt.axes(projection=ccrs.PlateCarree())
subset.surface.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), add_colorbar=False)
ax.coastlines()
/g/data/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/cartopy/mpl/geoaxes.py:1702: UserWarning: The input coordinates to pcolormesh are interpreted as cell centers, but are not monotonically increasing or decreasing. This may lead to incorrectly calculated cell edges, in which case, please supply explicit cell edges to pcolormesh.
X, Y, C, shading = self._pcolorargs('pcolormesh', *args,
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fa9fc0f1e80>
Note the blank areas in the corner though - there’s a gap where the longitudes wrap around. The gap is still there when plotting the lat-lon data in a polar projection, it’s a side effect of working in lat-lon coordinates at the poles.
ax = plt.axes(projection=ccrs.SouthPolarStereo())
subset.surface.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), add_colorbar=False)
ax.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fa9fc19eee0>
Converting from lat-lon to data coordinates#
To avoid these singularities, sometimes it’s better to convert the other way around, from lat-lon to the data cooridinates. Say we want to draw a lat-lon box on the polar view, created using this function
def lon_lat_box(lon_bounds, lat_bounds, refinement=2):
"""
Coordinates along the boundary of a rectangle in lat-lon coordinates
Args:
lon_bounds: (min, max) lon values
lat_bounds: (min, max) lat values
refinement: number of points to draw along each edge
Returns: (lons, lats)
"""
lons = []
lats = []
lons.append(numpy.linspace(lon_bounds[0], lon_bounds[-1], num=refinement))
lats.append(numpy.linspace(lat_bounds[0], lat_bounds[0], num=refinement))
lons.append(numpy.linspace(lon_bounds[-1], lon_bounds[-1], num=refinement))
lats.append(numpy.linspace(lat_bounds[0], lat_bounds[-1], num=refinement))
lons.append(numpy.linspace(lon_bounds[-1], lon_bounds[0], num=refinement))
lats.append(numpy.linspace(lat_bounds[-1], lat_bounds[-1], num=refinement))
lons.append(numpy.linspace(lon_bounds[0], lon_bounds[0], num=refinement))
lats.append(numpy.linspace(lat_bounds[-1], lat_bounds[0], num=refinement))
return numpy.concatenate(lons), numpy.concatenate(lats)
First let’s look at the box in the lat, lon coordinates
boxlon, boxlat = lon_lat_box([100,140],[-75,-60], refinement=100)
ax = plt.axes(projection=ccrs.PlateCarree())
subset.surface.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), add_colorbar=False)
ax.plot(boxlon, boxlat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_extent([90,150,-85,-50])
Now the same box in data coodinates, converting the other way around to what we did before.
latlon_to_polar = pyproj.Transformer.from_crs(target_crs, source_crs)
boxx, boxy = latlon_to_polar.transform(boxlat, boxlon)
ax = plt.axes()
subset.surface.plot(ax=ax)
ax.plot(boxx, boxy)
[<matplotlib.lines.Line2D at 0x7fa9f674ff40>]
You can use this method to do things like select a specific region using the box’s data coordinate bounds (also see the regionmask library) and otherwise work with datasets too large to convert from the file coordinates to lat-lon (converting the entire dataset here with pyproj takes a long time, we’re only plotting every 100th point)
region = subset.sel(x=slice(boxx.min(), boxx.max()), y=slice(boxy.max(), boxy.min()))
ax = plt.axes()
region.surface.plot(ax=ax)
ax.plot(boxx, boxy)
[<matplotlib.lines.Line2D at 0x7fa9f669dbe0>]
3d coordinates#
A way to get around singularities and wrapping issues is to work with the data in 3d coordinates. This needs a raw proj4 name of '+proj=geocent'
, and also a Z input value (which can just be zero if everything’s on the same level)
polar_to_cart = pyproj.Transformer.from_crs(source_crs, '+proj=geocent')
cX, cY, cZ = polar_to_cart.transform(X, Y, 0*X)
import matplotlib
plt.figure(figsize=(15,10))
# Make a 3d plot
ax = plt.axes(projection='3d')
# Use the values to colour the surface
cmap = plt.cm.ScalarMappable(cmap='viridis')
colors = cmap.to_rgba(subset.surface)
ax.plot_surface(cX, cY, cZ, facecolors=colors)
# Camera position
ax.elev = -60
ax.azim = 135