You can run this notebook in a live session Binder or view it on Github.

92b27f2d1afc4bbf89dd4ae5eac13d7d

Computation with Xarray

In this lesson, we discuss how to do scientific computations with xarray objects. Our learning goals are as follows. By the end of the lesson, we will be able to:

  • Apply basic arithmetic and numpy functions to xarray DataArrays / Dataset.

  • Use Xarray’s label-aware reduction operations (e.g. mean, sum) weighted reductions.

  • Apply arbitrary functions to Xarray data via apply_ufunc.

  • Use Xarray’s broadcasting to compute on arrays of different dimensionality.

  • Perform “split / apply / combine” workflows in Xarray using groupby, including

    • reductions within groups

    • transformations on groups

  • Use the resample, rolling and coarsen functions to manipulate data.

[1]:
import expectexception
import numpy as np
import xarray as xr
from matplotlib import pyplot as plt

Example Dataset

First we load a dataset. We will use the NOAA Extended Reconstructed Sea Surface Temperature (ERSST) v5 product, a widely used and trusted gridded compilation of of historical data going back to 1854.

Since the data is provided via an OPeNDAP server, we can load it directly without downloading anything:

[2]:
### NOTE: If hundreds of people connect to this server at once and download the same dataset,
###       things might not go so well! Recommended to use the Google Cloud copy instead.

# url = "http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/noaa.ersst.v5/sst.mnmean.nc"
# # drop an unnecessary variable which complicates some operations
# ds = xr.open_dataset(url, drop_variables=["time_bnds"])
# # will take a minute or two to complete
# ds = ds.sel(time=slice("1960", "2018")).load()
# ds
[3]:
import gcsfs

fs = gcsfs.GCSFileSystem(token="anon")
ds = xr.open_zarr(
    fs.get_mapper("gs://pangeo-noaa-ncei/noaa.ersst.v5.zarr"), consolidated=True
).load()
ds
[3]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 708)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
Data variables:
    sst      (time, lat, lon) float32 -1.8 -1.8 -1.8 -1.8 ... nan nan nan nan
Attributes:
    Conventions:                     CF-1.6, ACDD-1.3
    DODS_EXTRA.Unlimited_Dimension:  time
    References:                      https://www.ncdc.noaa.gov/data-access/ma...
    cdm_data_type:                   Grid
    citation:                        Huang et al, 2017: Extended Reconstructe...
    climatology:                     Climatology is based on 1971-2000 SST, X...
    comment:                         SSTs were observed by conventional therm...
    creator_name:                    Boyin Huang (original)
    creator_url_original:            https://www.ncei.noaa.gov
    data_modified:                   2020-07-07
    dataset_title:                   NOAA Extended Reconstructed SST V5
    date_created:                    2017-06-30T12:18:00Z (original)
    description:                     In situ data: ICOADS2.5 before 2007 and ...
    geospatial_lat_max:              89.0
    geospatial_lat_min:              -89.0
    geospatial_lat_units:            degrees_north
    geospatial_laty_max:             89.0
    geospatial_laty_min:             -89.0
    geospatial_lon_max:              359.0
    geospatial_lon_min:              -1.0
    geospatial_lon_units:            degrees_east
    history:                         created 07/2017 by PSD data using NCEI's...
    institution:                     This version written at NOAA/ESRL PSD: o...
    instrument:                      Conventional thermometers
    keywords:                        Earth Science > Oceans > Ocean Temperatu...
    keywords_vocabulary:             NASA Global Change Master Directory (GCM...
    license:                         No constraints on data access or use
    metadata_link:                   :metadata_link = https://doi.org/10.7289...
    original_publisher_url:          http://www.ncdc.noaa.gov
    platform:                        Ship and Buoy SSTs from ICOADS R3.0 and ...
    processing_level:                NOAA Level 4
    product_version:                 Version 5
    project:                         NOAA Extended Reconstructed Sea Surface ...
    source:                          In situ data: ICOADS R3.0 before 2015, N...
    source_comment:                  SSTs were observed by conventional therm...
    standard_name_vocabulary:        CF Standard Name Table (v40, 25 January ...
    summary:                         ERSST.v5 is developed based on v4 after ...
    title:                           NOAA ERSSTv5 (in situ only)

Let’s do some basic visualizations of the data, just to make sure it looks reasonable.

[4]:
ds.sst[0].plot(vmin=-2, vmax=30)
[4]:
<matplotlib.collections.QuadMesh at 0x7f0e40059e80>
../_images/scipy-tutorial_03_computation_with_xarray_6_1.png

Basic Arithmetic

Xarray dataarrays and datasets work seamlessly with arithmetic operators and numpy array functions.

For example, imagine we want to convert the temperature (given in Celsius) to Kelvin:

[5]:
sst_kelvin = ds.sst + 273.15
sst_kelvin
[5]:
<xarray.DataArray 'sst' (time: 708, lat: 89, lon: 180)>
array([[[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]],

       [[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]],

       [[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
...
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]],

       [[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]],

       [[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]]],
      dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

The dimensions and coordinates were preseved following the operation.

Warning: Although many xarray datasets have a units attribute, which is used in plotting, Xarray does not inherently understand units. However, work is underway to integrate xarray with pint, which will provide full unit-aware operations.

We can apply more complex functions, including numpy ufuncs, to Xarray objects. Imagine we wanted to compute the following expression as a function of SST (\(\Theta\)) in Kelvin:

\[f(\Theta) = 0.5 \ln(\Theta^2)\]
[6]:
f = 0.5 * np.log(sst_kelvin ** 2)
f
[6]:
<xarray.DataArray 'sst' (time: 708, lat: 89, lon: 180)>
array([[[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]],

       [[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]],

       [[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
...
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]],

       [[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]],

       [[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]]],
      dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

Applying Aribtrary Functions

It’s awesome that we can call np.log(ds) and have it “just work”. However, not all third party libraries work this way.

In this example, we will use functions from the Gibbs Seawater Toolkit, a package for the thermodynamics of seawater. This package provides ufuncs that operate on numpy arrays.

[7]:
import gsw

# an example function
# http://www.teos-10.org/pubs/gsw/html/gsw_t90_from_t68.html
gsw.t90_from_t68?
[8]:
gsw.t90_from_t68(ds.sst)  # -> returns a numpy array
[8]:
<xarray.DataArray 'sst' (time: 708, lat: 89, lon: 180)>
array([[[-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
        [-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
        [-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
        ...,
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan],
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan],
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan]],

       [[-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
        [-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
        [-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
...
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan],
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan],
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan]],

       [[-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
        [-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
        [-1.799568, -1.799568, -1.799568, ..., -1.799568, -1.799568,
         -1.799568],
        ...,
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan],
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan],
        [      nan,       nan,       nan, ...,       nan,       nan,
               nan]]], dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

It would be nice to keep our dimensions and coordinates. We can accomplish this with xr.apply_ufunc.

[9]:
xr.apply_ufunc(gsw.t90_from_t68, ds.sst)
[9]:
<xarray.DataArray 'sst' (time: 708, lat: 89, lon: 180)>
array([[[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
...
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]]])
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

Note: apply_ufunc is a powerful and mysterious function. It has many options for doing more complicated things. Unfortunately, we don’t have time to go into more depth here. Please consult the Xarray docs for more details.

Reductions

Just like in numpy, we can reduce xarray DataArrays along any number of axes:

[10]:
sst = ds.sst
sst.mean(axis=0)
[10]:
<xarray.DataArray 'sst' (lat: 89, lon: 180)>
array([[-1.7993844, -1.7993953, -1.7994034, ..., -1.799569 , -1.799457 ,
        -1.799379 ],
       [-1.7994268, -1.7993473, -1.799294 , ..., -1.7998592, -1.7996504,
        -1.7995317],
       [-1.7999867, -1.7998204, -1.7997179, ..., -1.799865 , -1.7999135,
        -1.7999635],
       ...,
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan]], dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
[11]:
sst.mean(axis=(1, 2))
[11]:
<xarray.DataArray 'sst' (time: 708)>
array([13.447333 , 13.538178 , 13.560617 , 13.449021 , 13.469315 ,
       13.454981 , 13.72113  , 13.916064 , 13.822409 , 13.53989  ,
       13.323178 , 13.364261 , 13.491084 , 13.624221 , 13.623529 ,
       13.525916 , 13.48518  , 13.5291395, 13.710998 , 13.902636 ,
       13.786833 , 13.500918 , 13.322204 , 13.324784 , 13.4614725,
       13.554375 , 13.570085 , 13.467039 , 13.464627 , 13.515269 ,
       13.7487135, 13.86036  , 13.797006 , 13.499739 , 13.294295 ,
       13.312595 , 13.420587 , 13.541743 , 13.535124 , 13.477131 ,
       13.4698715, 13.525112 , 13.6962595, 13.918255 , 13.764984 ,
       13.465238 , 13.324522 , 13.340322 , 13.438337 , 13.514883 ,
       13.488945 , 13.367167 , 13.350077 , 13.449749 , 13.60579  ,
       13.718753 , 13.616768 , 13.327129 , 13.127264 , 13.148032 ,
       13.36128  , 13.472328 , 13.458139 , 13.388378 , 13.31617  ,
       13.415058 , 13.603073 , 13.795105 , 13.720209 , 13.420439 ,
       13.248442 , 13.274323 , 13.424941 , 13.506327 , 13.517883 ,
       13.4329195, 13.372339 , 13.450058 , 13.671843 , 13.779454 ,
       13.685076 , 13.430487 , 13.293945 , 13.326498 , 13.416443 ,
       13.515367 , 13.532816 , 13.443419 , 13.459111 , 13.521452 ,
       13.673624 , 13.855462 , 13.68534  , 13.458884 , 13.242759 ,
       13.240546 , 13.329314 , 13.464171 , 13.442738 , 13.397366 ,
...
       13.623425 , 13.612804 , 13.739813 , 13.865803 , 13.857512 ,
       13.776316 , 13.768099 , 13.893071 , 14.132943 , 14.323578 ,
       14.175996 , 13.839251 , 13.622542 , 13.59652  , 13.73716  ,
       13.874577 , 13.845826 , 13.794755 , 13.80382  , 13.927139 ,
       14.200779 , 14.419651 , 14.300908 , 13.951674 , 13.712137 ,
       13.726637 , 13.808819 , 13.926258 , 13.912461 , 13.84714  ,
       13.827132 , 13.905337 , 14.194412 , 14.374818 , 14.2309265,
       13.904327 , 13.720607 , 13.692019 , 13.826905 , 13.943245 ,
       13.953634 , 13.921505 , 13.908049 , 14.04417  , 14.274524 ,
       14.4880495, 14.345023 , 13.980439 , 13.755713 , 13.76705  ,
       13.899665 , 13.994961 , 14.024579 , 13.983573 , 13.978161 ,
       14.071658 , 14.313522 , 14.5173025, 14.437494 , 14.145153 ,
       13.922181 , 13.925177 , 14.107267 , 14.163773 , 14.169422 ,
       14.069178 , 14.0240755, 14.15859  , 14.405118 , 14.586667 ,
       14.463736 , 14.150087 , 13.865754 , 13.8558   , 14.032689 ,
       14.1451   , 14.133999 , 14.04923  , 13.981656 , 14.096308 ,
       14.37745  , 14.55437  , 14.387506 , 14.084339 , 13.8478365,
       13.800663 , 13.930675 , 14.03369  , 14.038775 , 13.991124 ,
       13.960315 , 14.027888 , 14.255452 , 14.474634 , 14.393512 ,
       14.140695 , 13.8931875, 13.890292 ], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
[12]:
sst.mean()
[12]:
<xarray.DataArray 'sst' ()>
array(13.746263, dtype=float32)

However, rather than performing reductions on axes (as in numpy), we can perform them on dimensions. This turns out to be a huge convenience, particularly in complex calculations when you can’t easily remember which axis corresponds to which dimension:

[13]:
sst.mean(dim="time")
[13]:
<xarray.DataArray 'sst' (lat: 89, lon: 180)>
array([[-1.7993844, -1.7993953, -1.7994034, ..., -1.799569 , -1.799457 ,
        -1.799379 ],
       [-1.7994268, -1.7993473, -1.799294 , ..., -1.7998592, -1.7996504,
        -1.7995317],
       [-1.7999867, -1.7998204, -1.7997179, ..., -1.799865 , -1.7999135,
        -1.7999635],
       ...,
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan]], dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0

All of the standard numpy reductions (e.g. min, max, sum, std, etc.) are available

Exercise

Take the mean of sst in both longitude and latitude. Make a simple timeseries plot:

[14]:
# your code here

Broadcasting

Broadcasting refers to the alignmed of arrays with different numbers of dimensions. Numpy’s broadcasting rules, based on array shape, can sometimes be difficult to understand and remember. Xarray does brodcasting by dimension name, rather than array shape. This is a huge convenience.

Let’s now create two arrays with some dimensions in common. For this example, we will create a “weights” array proportional to cosine of latitude. Modulo a normalization, this is the correct area-weighting factor for data on a regular lat-lon grid.

[15]:
weights = np.cos(np.deg2rad(ds.lat))
weights.dims
[15]:
('lat',)

If we multiply this by SST, it “just works,” and the arrays are broadcasted properly:

[16]:
(ds.sst * weights).dims
[16]:
('time', 'lat', 'lon')

Warning: If the arrays being broadcasted share a dimension name, but have different coordinates, they will first be aligned using Xarray’s default align settings (including filling missing values with NaNs). If that’s not what you want, it’s best to call align explicitly before broadcasting.

Weighted Reductions

We could imagine computing the weighted spatial mean of SST manually, like this:

[17]:
sst_mean = (ds.sst * weights).sum(dim=("lon", "lat")) / weights.sum(dim="lat")
sst_mean.plot()
plt.title("This is wrong!")
[17]:
Text(0.5, 1.0, 'This is wrong!')
../_images/scipy-tutorial_03_computation_with_xarray_32_1.png

That would be wrong, however, because the denominator (weights.sum(dim='lat')) needs to be expanded to include the lon dimension and modified to account for the missing values (land points).

In general, weighted reductions on multidimensional arrays are complicated. To make it a bit easier, Xarray provides a mechanism for weighted reductions. It does this by creating a special intermediate DataArrayWeighted object, to which different reduction operations can applied.

[18]:
sst_weighted = ds.sst.weighted(weights)
sst_weighted
[18]:
DataArrayWeighted with weights along dimensions: lat
[19]:
sst_weighted.mean(dim=("lon", "lat")).plot()
plt.title("Correct Global Mean SST")
[19]:
Text(0.5, 1.0, 'Correct Global Mean SST')
../_images/scipy-tutorial_03_computation_with_xarray_35_1.png

Groupby

Xarray copies Pandas’ very useful groupby functionality, enabling the “split / apply / combine” workflow on xarray DataArrays and Datasets.

To provide a physically motivated example, let’s examine a timeseries of SST at a single point.

[20]:
ds.sst.sel(lon=300, lat=50).plot()
[20]:
[<matplotlib.lines.Line2D at 0x7f0e390229a0>]
../_images/scipy-tutorial_03_computation_with_xarray_37_1.png

As we can see from the plot, the timeseries at any one point is totally dominated by the seasonal cycle. We would like to remove this seasonal cycle (called the “climatology”) in order to better see the long-term variaitions in temperature. We can accomplish this using groupby.

Before moving forward, w note that xarray correctly parsed the time index, resulting in a Pandas datetime index on the time dimension.

[21]:
ds.time
[21]:
<xarray.DataArray 'time' (time: 708)>
array(['1960-01-01T00:00:00.000000000', '1960-02-01T00:00:00.000000000',
       '1960-03-01T00:00:00.000000000', ..., '2018-10-01T00:00:00.000000000',
       '2018-11-01T00:00:00.000000000', '2018-12-01T00:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
Attributes:
    _ChunkSizes:      1
    actual_range:     [19723.0, 80505.0]
    avg_period:       0000-01-00 00:00:00
    axis:             T
    delta_t:          0000-01-00 00:00:00
    long_name:        Time
    prev_avg_period:  0000-00-07 00:00:00
    standard_name:    time

The syntax of Xarray’s groupby is almost identical to Pandas.

[22]:
ds.groupby?

The most important argument is group: this defines the unique values we will us to “split” the data for grouped analysis. We can pass either a DataArray or a name of a variable in the dataset. Lets first use a DataArray. Just like with Pandas, we can use the time indexe to extract specific components of dates and times. Xarray uses a special syntax for this .dt, called the DatetimeAccessor.

[23]:
ds.time.dt
[23]:
<xarray.core.accessor_dt.DatetimeAccessor at 0x7f0e39018550>
[24]:
ds.time.dt.month
[24]:
<xarray.DataArray 'month' (time: 708)>
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,
        4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,
        9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,
        2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,
       12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,
        5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9,
       10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,
        3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,
        8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,
        1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,
        4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,
        9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,
        2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,
       12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,
...
        3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,
        8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,
        1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,
        4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,
        9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,
        2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,
       12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,
        5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9,
       10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,
        3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,
        8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,
        1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,
        4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,
        9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,
        2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
Coordinates:
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

ds.time.dt.year

We can use these arrays in a groupby operation:

[25]:
gb = ds.groupby(ds.time.dt.month)
gb
[25]:
DatasetGroupBy, grouped over 'month'
12 groups with labels 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

Xarray also offers a more concise syntax when the variable you’re grouping on is already present in the dataset. This is identical to the previous line:

[26]:
gb = ds.groupby("time.month")
gb
[26]:
DatasetGroupBy, grouped over 'month'
12 groups with labels 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

Now that the data are split, we can manually iterate over the group. The iterator returns the key (group name) and the value (the actual dataset corresponding to that group) for each group.

[27]:
for group_name, group_ds in gb:
    # stop iterating after the first loop
    break
print(group_name)
group_ds
1
[27]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 59)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1961-01-01 ... 2018-01-01
Data variables:
    sst      (time, lat, lon) float32 -1.8 -1.8 -1.8 -1.8 ... nan nan nan nan
Attributes:
    Conventions:                     CF-1.6, ACDD-1.3
    DODS_EXTRA.Unlimited_Dimension:  time
    References:                      https://www.ncdc.noaa.gov/data-access/ma...
    cdm_data_type:                   Grid
    citation:                        Huang et al, 2017: Extended Reconstructe...
    climatology:                     Climatology is based on 1971-2000 SST, X...
    comment:                         SSTs were observed by conventional therm...
    creator_name:                    Boyin Huang (original)
    creator_url_original:            https://www.ncei.noaa.gov
    data_modified:                   2020-07-07
    dataset_title:                   NOAA Extended Reconstructed SST V5
    date_created:                    2017-06-30T12:18:00Z (original)
    description:                     In situ data: ICOADS2.5 before 2007 and ...
    geospatial_lat_max:              89.0
    geospatial_lat_min:              -89.0
    geospatial_lat_units:            degrees_north
    geospatial_laty_max:             89.0
    geospatial_laty_min:             -89.0
    geospatial_lon_max:              359.0
    geospatial_lon_min:              -1.0
    geospatial_lon_units:            degrees_east
    history:                         created 07/2017 by PSD data using NCEI's...
    institution:                     This version written at NOAA/ESRL PSD: o...
    instrument:                      Conventional thermometers
    keywords:                        Earth Science > Oceans > Ocean Temperatu...
    keywords_vocabulary:             NASA Global Change Master Directory (GCM...
    license:                         No constraints on data access or use
    metadata_link:                   :metadata_link = https://doi.org/10.7289...
    original_publisher_url:          http://www.ncdc.noaa.gov
    platform:                        Ship and Buoy SSTs from ICOADS R3.0 and ...
    processing_level:                NOAA Level 4
    product_version:                 Version 5
    project:                         NOAA Extended Reconstructed Sea Surface ...
    source:                          In situ data: ICOADS R3.0 before 2015, N...
    source_comment:                  SSTs were observed by conventional therm...
    standard_name_vocabulary:        CF Standard Name Table (v40, 25 January ...
    summary:                         ERSST.v5 is developed based on v4 after ...
    title:                           NOAA ERSSTv5 (in situ only)

Now that we have groups defined, it’s time to “apply” a calculation to the group. Like in Pandas, these calculations can either be:

  • aggregation: reduces the size of the group

  • transformation: preserves the group’s full size

At then end of the apply step, xarray will automatically combine the aggregated / transformed groups back into a single object.

The most fundamental way to apply is with the .map method.

[28]:
gb.map?

Aggregations

.apply accepts as its argument a function. We can pass an existing function:

[29]:
gb.map(np.mean)
[29]:
<xarray.Dataset>
Dimensions:  (month: 12)
Coordinates:
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (month) float32 13.66 13.77 13.76 13.68 ... 13.98 13.69 13.51 13.53

Because we specified no extra arguments (like axis) the function was applied over all space and time dimensions. This is not what we wanted. Instead, we could define a custom function. This function takes a single argument–the group dataset–and returns a new dataset to be combined:

[30]:
def time_mean(a):
    return a.mean(dim="time")


gb.map(time_mean)
[30]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, month: 12)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (month, lat, lon) float32 -1.8 -1.8 -1.8 -1.8 ... nan nan nan nan

Like Pandas, xarray’s groupby object has many built-in aggregation operations (e.g. mean, min, max, std, etc):

[31]:
# this does the same thing as the previous cell
ds_mm = gb.mean(dim="time")
ds_mm
[31]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, month: 12)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (month, lat, lon) float32 -1.8 -1.8 -1.8 -1.8 ... nan nan nan nan

So we did what we wanted to do: calculate the climatology at every point in the dataset. Let’s look at the data a bit.

Climatlogy at a specific point in the North Atlantic

[32]:
ds_mm.sst.sel(lon=300, lat=50).plot()
[32]:
[<matplotlib.lines.Line2D at 0x7f0e38f6adc0>]
../_images/scipy-tutorial_03_computation_with_xarray_61_1.png

Zonal Mean Climatolgoy

[33]:
ds_mm.sst.mean(dim="lon").plot.contourf(x="month", levels=12, vmin=-2, vmax=30)
[33]:
<matplotlib.contour.QuadContourSet at 0x7f0e38f4b820>
../_images/scipy-tutorial_03_computation_with_xarray_63_1.png

Difference between January and July Climatology

[34]:
(ds_mm.sst.sel(month=1) - ds_mm.sst.sel(month=7)).plot(vmax=10)
[34]:
<matplotlib.collections.QuadMesh at 0x7f0e38e94400>
../_images/scipy-tutorial_03_computation_with_xarray_65_1.png

Transformations

Now we want to remove this climatology from the dataset, to examine the residual, called the anomaly, which is the interesting part from a climate perspective. Removing the seasonal climatology is a perfect example of a transformation: it operates over a group, but doesn’t change the size of the dataset. Here is one way to code it

[35]:
def remove_time_mean(x):
    return x - x.mean(dim="time")


ds_anom = ds.groupby("time.month").map(remove_time_mean)
ds_anom
[35]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 708)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
Data variables:
    sst      (time, lat, lon) float32 9.537e-07 9.537e-07 9.537e-07 ... nan nan

Xarray makes these sorts of transformations easy by supporting groupby arithmetic. This concept is easiest explained with an example:

[36]:
gb = ds.groupby("time.month")
ds_anom = gb - gb.mean(dim="time")
ds_anom
[36]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 708)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
    month    (time) int64 1 2 3 4 5 6 7 8 9 10 11 ... 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (time, lat, lon) float32 9.537e-07 9.537e-07 9.537e-07 ... nan nan

Now we can view the climate signal without the overwhelming influence of the seasonal cycle.

Timeseries at a single point in the North Atlantic

[37]:
ds_anom.sst.sel(lon=300, lat=50).plot()
[37]:
[<matplotlib.lines.Line2D at 0x7f0e38dbb4f0>]
../_images/scipy-tutorial_03_computation_with_xarray_71_1.png

Difference between Jan. 1 2018 and Jan. 1 1960

[38]:
(ds_anom.sel(time="2018-01-01") - ds_anom.sel(time="1960-01-01")).sst.plot()
[38]:
<matplotlib.collections.QuadMesh at 0x7f0e38d24e50>
../_images/scipy-tutorial_03_computation_with_xarray_73_1.png

Grouby-Related: Resample, Rolling, Coarsen

Resample in xarray is nearly identical to Pandas. It is effectively a group-by operation, and uses the same basic syntax. It can be applied only to time-index dimensions. Here we compute the five-year mean.

[39]:
resample_obj = ds_anom.resample(time="5Y")
resample_obj
[39]:
DatasetResample, grouped over '__resample_dim__'
13 groups with labels 1960-12-31, ..., 2020-12-31.
[40]:
ds_anom_resample = resample_obj.mean(dim="time")
ds_anom_resample
[40]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 13)
Coordinates:
  * time     (time) datetime64[ns] 1960-12-31 1965-12-31 ... 2020-12-31
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
Data variables:
    sst      (time, lat, lon) float32 -0.0006173 -0.0006063 ... nan nan
[41]:
ds_anom.sst.sel(lon=300, lat=50).plot()
ds_anom_resample.sst.sel(lon=300, lat=50).plot(marker="o")
[41]:
[<matplotlib.lines.Line2D at 0x7f0e38c80f40>]
../_images/scipy-tutorial_03_computation_with_xarray_77_1.png

Note: resample only works with proper datetime indexes.

Rolling is also similar to pandas, but can be applied along any dimension. It works with logical coordinates.

[42]:
ds_anom_rolling = ds_anom.rolling(time=12, center=True).mean()
ds_anom_rolling
[42]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 708)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
    month    (time) int64 1 2 3 4 5 6 7 8 9 10 11 ... 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
[43]:
ds_anom.sst.sel(lon=300, lat=50).plot(label="monthly anom")
ds_anom_resample.sst.sel(lon=300, lat=50).plot(
    marker="o", label="5 year resample"
)
ds_anom_rolling.sst.sel(lon=300, lat=50).plot(label="12 month rolling mean")
plt.legend()
[43]:
<matplotlib.legend.Legend at 0x7f0e38d00280>
../_images/scipy-tutorial_03_computation_with_xarray_80_1.png

coarsen does something similar to resample, but without being aware of time. It operates on logical coordinates only but can work on multiple dimensions at a time.

[44]:
ds_anom_coarsen_time = ds_anom.coarsen(time=12).mean()

ds_anom_rolling.sst.sel(lon=300, lat=50).plot(label="12 month rolling mean")
ds_anom_coarsen_time.sst.sel(lon=300, lat=50).plot(
    marker="^", label="12 item coarsen"
)
plt.legend()
[44]:
<matplotlib.legend.Legend at 0x7f0e38e146a0>
../_images/scipy-tutorial_03_computation_with_xarray_82_1.png
[45]:
%%expect_exception
ds_anom_coarsen_space = ds_anom.coarsen(lon=4, lat=4).mean()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-f690a99950fa> in <module>
----> 1 ds_anom_coarsen_space = ds_anom.coarsen(lon=4, lat=4).mean()

~/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/rolling.py in wrapped_func(self, **kwargs)
    826             reduced = {}
    827             for key, da in self.obj.data_vars.items():
--> 828                 reduced[key] = da.variable.coarsen(
    829                     self.windows, func, self.boundary, self.side, **kwargs
    830                 )

~/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/variable.py in coarsen(self, windows, func, boundary, side, keep_attrs, **kwargs)
   2014             _attrs = None
   2015
-> 2016         reshaped, axes = self._coarsen_reshape(windows, boundary, side)
   2017         if isinstance(func, str):
   2018             name = func

~/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/variable.py in _coarsen_reshape(self, windows, boundary, side)
   2048             if boundary[d] == "exact":
   2049                 if n * window != size:
-> 2050                     raise ValueError(
   2051                         "Could not coarsen a dimension of size {} with "
   2052                         "window {}".format(size, window)

ValueError: Could not coarsen a dimension of size 89 with window 4
[46]:
ds_anom_coarsen_space = (
    ds_anom.isel(lat=slice(0, -1)).coarsen(lon=4, lat=4).mean()
)
ds_anom_coarsen_space
[46]:
<xarray.Dataset>
Dimensions:  (lat: 22, lon: 45, time: 708)
Coordinates:
  * lat      (lat) float32 85.0 77.0 69.0 61.0 53.0 ... -59.0 -67.0 -75.0 -83.0
  * lon      (lon) float32 3.0 11.0 19.0 27.0 35.0 ... 331.0 339.0 347.0 355.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
    month    (time) int64 1 2 3 4 5 6 7 8 9 10 11 ... 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (time, lat, lon) float32 -0.000229 -0.02238 -0.05696 ... nan nan
[47]:
ds_anom_coarsen_space.sst.isel(time=0).plot()
[47]:
<matplotlib.collections.QuadMesh at 0x7f0e38c17ac0>
../_images/scipy-tutorial_03_computation_with_xarray_85_1.png

Exercise

Load the following “basin mask” dataset, and use it to take a weighted average of SST in each ocean basin. Figure out which ocean basins are the warmest and coldest.

Hint: you will first need to align this dataset with the SST dataset. Use what you learned in the “indexing and alignment” lesson.

[48]:
basin = xr.open_dataset(
    "http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NODC/.WOA09/.Masks/.basin/dods"
)
basin
[48]:
<xarray.Dataset>
Dimensions:  (X: 360, Y: 180, Z: 33)
Coordinates:
  * Z        (Z) float32 0.0 10.0 20.0 30.0 50.0 ... 4e+03 4.5e+03 5e+03 5.5e+03
  * X        (X) float32 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
  * Y        (Y) float32 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
Data variables:
    basin    (Z, Y, X) float32 ...
Attributes:
    Conventions:  IRIDL