You can run this notebook in a live session Binder or view it on Github.

d62dd502890042beade7d20b4027b492

Computation with Xarray

In this lesson, we discuss how to do scientific computations with xarray objects. Our learning goals are as follows. By the end of the lesson, we will be able to:

  • Apply basic arithmetic and numpy functions to xarray DataArrays / Dataset.

  • Use Xarray’s label-aware reduction operations (e.g. mean, sum) weighted reductions.

  • Apply arbitrary functions to Xarray data via apply_ufunc.

  • Use Xarray’s broadcasting to compute on arrays of different dimensionality.

  • Perform “split / apply / combine” workflows in Xarray using groupby, including

    • reductions within groups

    • transformations on groups

  • Use the resample, rolling and coarsen functions to manipulate data.

[1]:
import expectexception
import numpy as np
import xarray as xr
from matplotlib import pyplot as plt

Example Dataset

First we load a dataset. We will use the NOAA Extended Reconstructed Sea Surface Temperature (ERSST) v5 product, a widely used and trusted gridded compilation of of historical data going back to 1854.

Since the data is provided via an OPeNDAP server, we can load it directly without downloading anything:

[2]:
### NOTE: If hundreds of people connect to this server at once and download the same dataset,
###       things might not go so well! Recommended to use the Google Cloud copy instead.

# url = "http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/noaa.ersst.v5/sst.mnmean.nc"
# # drop an unnecessary variable which complicates some operations
# ds = xr.open_dataset(url, drop_variables=["time_bnds"])
# # will take a minute or two to complete
# ds = ds.sel(time=slice("1960", "2018")).load()
# ds
[3]:
import gcsfs

fs = gcsfs.GCSFileSystem(token="anon")
ds = xr.open_zarr(
    fs.get_mapper("gs://pangeo-noaa-ncei/noaa.ersst.v5.zarr"), consolidated=True
).load()
ds
[3]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 708)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
Data variables:
    sst      (time, lat, lon) float32 -1.8 -1.8 -1.8 -1.8 ... nan nan nan nan
Attributes:
    Conventions:                     CF-1.6, ACDD-1.3
    DODS_EXTRA.Unlimited_Dimension:  time
    References:                      https://www.ncdc.noaa.gov/data-access/ma...
    cdm_data_type:                   Grid
    citation:                        Huang et al, 2017: Extended Reconstructe...
    climatology:                     Climatology is based on 1971-2000 SST, X...
    comment:                         SSTs were observed by conventional therm...
    creator_name:                    Boyin Huang (original)
    creator_url_original:            https://www.ncei.noaa.gov
    data_modified:                   2020-07-07
    dataset_title:                   NOAA Extended Reconstructed SST V5
    date_created:                    2017-06-30T12:18:00Z (original)
    description:                     In situ data: ICOADS2.5 before 2007 and ...
    geospatial_lat_max:              89.0
    geospatial_lat_min:              -89.0
    geospatial_lat_units:            degrees_north
    geospatial_laty_max:             89.0
    geospatial_laty_min:             -89.0
    geospatial_lon_max:              359.0
    geospatial_lon_min:              -1.0
    geospatial_lon_units:            degrees_east
    history:                         created 07/2017 by PSD data using NCEI's...
    institution:                     This version written at NOAA/ESRL PSD: o...
    instrument:                      Conventional thermometers
    keywords:                        Earth Science > Oceans > Ocean Temperatu...
    keywords_vocabulary:             NASA Global Change Master Directory (GCM...
    license:                         No constraints on data access or use
    metadata_link:                   :metadata_link = https://doi.org/10.7289...
    original_publisher_url:          http://www.ncdc.noaa.gov
    platform:                        Ship and Buoy SSTs from ICOADS R3.0 and ...
    processing_level:                NOAA Level 4
    product_version:                 Version 5
    project:                         NOAA Extended Reconstructed Sea Surface ...
    source:                          In situ data: ICOADS R3.0 before 2015, N...
    source_comment:                  SSTs were observed by conventional therm...
    standard_name_vocabulary:        CF Standard Name Table (v40, 25 January ...
    summary:                         ERSST.v5 is developed based on v4 after ...
    title:                           NOAA ERSSTv5 (in situ only)

Let’s do some basic visualizations of the data, just to make sure it looks reasonable.

[4]:
ds.sst[0].plot(vmin=-2, vmax=30)
[4]:
<matplotlib.collections.QuadMesh at 0x7fbd5c599d30>
../_images/scipy-tutorial_03_computation_with_xarray_6_1.png

Basic Arithmetic

Xarray dataarrays and datasets work seamlessly with arithmetic operators and numpy array functions.

For example, imagine we want to convert the temperature (given in Celsius) to Kelvin:

[5]:
sst_kelvin = ds.sst + 273.15
sst_kelvin
[5]:
<xarray.DataArray 'sst' (time: 708, lat: 89, lon: 180)>
array([[[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]],

       [[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]],

       [[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
...
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]],

       [[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]],

       [[271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        [271.35, 271.35, 271.35, ..., 271.35, 271.35, 271.35],
        ...,
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan],
        [   nan,    nan,    nan, ...,    nan,    nan,    nan]]],
      dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

The dimensions and coordinates were preseved following the operation.

Warning: Although many xarray datasets have a units attribute, which is used in plotting, Xarray does not inherently understand units. However, work is underway to integrate xarray with pint, which will provide full unit-aware operations.

We can apply more complex functions, including numpy ufuncs, to Xarray objects. Imagine we wanted to compute the following expression as a function of SST (\(\Theta\)) in Kelvin:

\[f(\Theta) = 0.5 \ln(\Theta^2)\]
[6]:
f = 0.5 * np.log(sst_kelvin ** 2)
f
[6]:
<xarray.DataArray 'sst' (time: 708, lat: 89, lon: 180)>
array([[[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]],

       [[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]],

       [[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
...
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]],

       [[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]],

       [[5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        [5.60341, 5.60341, 5.60341, ..., 5.60341, 5.60341, 5.60341],
        ...,
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan],
        [    nan,     nan,     nan, ...,     nan,     nan,     nan]]],
      dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

Applying Aribtrary Functions

It’s awesome that we can call np.log(ds) and have it “just work”. However, not all third party libraries work this way.

In this example, we will use functions from the Gibbs Seawater Toolkit, a package for the thermodynamics of seawater. This package provides ufuncs that operate on numpy arrays.

[7]:
import gsw

# an example function
# http://www.teos-10.org/pubs/gsw/html/gsw_t90_from_t68.html
gsw.t90_from_t68?
[8]:
gsw.t90_from_t68(ds.sst)  # -> returns a numpy array
[8]:
array([[[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       ...,

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]]])

It would be nice to keep our dimensions and coordinates. We can accomplish this with xr.apply_ufunc.

[9]:
xr.apply_ufunc(gsw.t90_from_t68, ds.sst)
[9]:
<xarray.DataArray 'sst' (time: 708, lat: 89, lon: 180)>
array([[[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
...
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]],

       [[-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        [-1.79956806, -1.79956806, -1.79956806, ..., -1.79956806,
         -1.79956806, -1.79956806],
        ...,
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan]]])
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

Note: apply_ufunc is a powerful and mysterious function. It has many options for doing more complicated things. Unfortunately, we don’t have time to go into more depth here. Please consult the Xarray docs for more details.

Reductions

Just like in numpy, we can reduce xarray DataArrays along any number of axes:

[10]:
sst = ds.sst
sst.mean(axis=0)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[10]:
<xarray.DataArray 'sst' (lat: 89, lon: 180)>
array([[-1.7993844, -1.7993953, -1.7994034, ..., -1.799569 , -1.799457 ,
        -1.799379 ],
       [-1.7994268, -1.7993473, -1.799294 , ..., -1.7998592, -1.7996504,
        -1.7995317],
       [-1.7999867, -1.7998204, -1.7997179, ..., -1.799865 , -1.7999135,
        -1.7999635],
       ...,
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan]], dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
[11]:
sst.mean(axis=(1, 2))
[11]:
<xarray.DataArray 'sst' (time: 708)>
array([13.447333 , 13.538178 , 13.560617 , 13.449021 , 13.469315 ,
       13.454981 , 13.72113  , 13.916064 , 13.822409 , 13.53989  ,
       13.323178 , 13.364261 , 13.491084 , 13.624221 , 13.623529 ,
       13.525916 , 13.48518  , 13.5291395, 13.710998 , 13.902636 ,
       13.786833 , 13.500918 , 13.322204 , 13.324784 , 13.4614725,
       13.554375 , 13.570085 , 13.467039 , 13.464627 , 13.515269 ,
       13.7487135, 13.86036  , 13.797006 , 13.499739 , 13.294295 ,
       13.312595 , 13.420587 , 13.541743 , 13.535124 , 13.477131 ,
       13.4698715, 13.525112 , 13.6962595, 13.918255 , 13.764984 ,
       13.465238 , 13.324522 , 13.340322 , 13.438337 , 13.514883 ,
       13.488945 , 13.367167 , 13.350077 , 13.449749 , 13.60579  ,
       13.718753 , 13.616768 , 13.327129 , 13.127264 , 13.148032 ,
       13.36128  , 13.472328 , 13.458139 , 13.388378 , 13.31617  ,
       13.415058 , 13.603073 , 13.795105 , 13.720209 , 13.420439 ,
       13.248442 , 13.274323 , 13.424941 , 13.506327 , 13.517883 ,
       13.4329195, 13.372339 , 13.450058 , 13.671843 , 13.779454 ,
       13.685076 , 13.430487 , 13.293945 , 13.326498 , 13.416443 ,
       13.515367 , 13.532816 , 13.443419 , 13.459111 , 13.521452 ,
       13.673624 , 13.855462 , 13.68534  , 13.458884 , 13.242759 ,
       13.240546 , 13.329314 , 13.464171 , 13.442738 , 13.397366 ,
...
       13.623425 , 13.612804 , 13.739813 , 13.865803 , 13.857512 ,
       13.776316 , 13.768099 , 13.893071 , 14.132943 , 14.323578 ,
       14.175996 , 13.839251 , 13.622542 , 13.59652  , 13.73716  ,
       13.874577 , 13.845826 , 13.794755 , 13.80382  , 13.927139 ,
       14.200779 , 14.419651 , 14.300908 , 13.951674 , 13.712137 ,
       13.726637 , 13.808819 , 13.926258 , 13.912461 , 13.84714  ,
       13.827132 , 13.905337 , 14.194412 , 14.374818 , 14.2309265,
       13.904327 , 13.720607 , 13.692019 , 13.826905 , 13.943245 ,
       13.953634 , 13.921505 , 13.908049 , 14.04417  , 14.274524 ,
       14.4880495, 14.345023 , 13.980439 , 13.755713 , 13.76705  ,
       13.899665 , 13.994961 , 14.024579 , 13.983573 , 13.978161 ,
       14.071658 , 14.313522 , 14.5173025, 14.437494 , 14.145153 ,
       13.922181 , 13.925177 , 14.107267 , 14.163773 , 14.169422 ,
       14.069178 , 14.0240755, 14.15859  , 14.405118 , 14.586667 ,
       14.463736 , 14.150087 , 13.865754 , 13.8558   , 14.032689 ,
       14.1451   , 14.133999 , 14.04923  , 13.981656 , 14.096308 ,
       14.37745  , 14.55437  , 14.387506 , 14.084339 , 13.8478365,
       13.800663 , 13.930675 , 14.03369  , 14.038775 , 13.991124 ,
       13.960315 , 14.027888 , 14.255452 , 14.474634 , 14.393512 ,
       14.140695 , 13.8931875, 13.890292 ], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
[12]:
sst.mean()
[12]:
<xarray.DataArray 'sst' ()>
array(13.746263, dtype=float32)

However, rather than performing reductions on axes (as in numpy), we can perform them on dimensions. This turns out to be a huge convenience, particularly in complex calculations when you can’t easily remember which axis corresponds to which dimension:

[13]:
sst.mean(dim="time")
[13]:
<xarray.DataArray 'sst' (lat: 89, lon: 180)>
array([[-1.7993844, -1.7993953, -1.7994034, ..., -1.799569 , -1.799457 ,
        -1.799379 ],
       [-1.7994268, -1.7993473, -1.799294 , ..., -1.7998592, -1.7996504,
        -1.7995317],
       [-1.7999867, -1.7998204, -1.7997179, ..., -1.799865 , -1.7999135,
        -1.7999635],
       ...,
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [       nan,        nan,        nan, ...,        nan,        nan,
               nan]], dtype=float32)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0

All of the standard numpy reductions (e.g. min, max, sum, std, etc.) are available

Exercise

Take the mean of sst in both longitude and latitude. Make a simple timeseries plot:

[14]:
# your code here

Broadcasting

Broadcasting refers to the alignmed of arrays with different numbers of dimensions. Numpy’s broadcasting rules, based on array shape, can sometimes be difficult to understand and remember. Xarray does brodcasting by dimension name, rather than array shape. This is a huge convenience.

Let’s now create two arrays with some dimensions in common. For this example, we will create a “weights” array proportional to cosine of latitude. Modulo a normalization, this is the correct area-weighting factor for data on a regular lat-lon grid.

[15]:
weights = np.cos(np.deg2rad(ds.lat))
weights.dims
[15]:
('lat',)

If we multiply this by SST, it “just works,” and the arrays are broadcasted properly:

[16]:
(ds.sst * weights).dims
[16]:
('time', 'lat', 'lon')

Warning: If the arrays being broadcasted share a dimension name, but have different coordinates, they will first be aligned using Xarray’s default align settings (including filling missing values with NaNs). If that’s not what you want, it’s best to call align explicitly before broadcasting.

Weighted Reductions

We could imagine computing the weighted spatial mean of SST manually, like this:

[17]:
sst_mean = (ds.sst * weights).sum(dim=("lon", "lat")) / weights.sum(dim="lat")
sst_mean.plot()
plt.title("This is wrong!")
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
[17]:
Text(0.5, 1.0, 'This is wrong!')
../_images/scipy-tutorial_03_computation_with_xarray_32_2.png

That would be wrong, however, because the denominator (weights.sum(dim='lat')) needs to be expanded to include the lon dimension and modified to account for the missing values (land points).

In general, weighted reductions on multidimensional arrays are complicated. To make it a bit easier, Xarray provides a mechanism for weighted reductions. It does this by creating a special intermediate DataArrayWeighted object, to which different reduction operations can applied.

[18]:
sst_weighted = ds.sst.weighted(weights)
sst_weighted
[18]:
DataArrayWeighted with weights along dimensions: lat
[19]:
sst_weighted.mean(dim=("lon", "lat")).plot()
plt.title("Correct Global Mean SST")
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
[19]:
Text(0.5, 1.0, 'Correct Global Mean SST')
../_images/scipy-tutorial_03_computation_with_xarray_35_2.png

Groupby

Xarray copies Pandas’ very useful groupby functionality, enabling the “split / apply / combine” workflow on xarray DataArrays and Datasets.

To provide a physically motivated example, let’s examine a timeseries of SST at a single point.

[20]:
ds.sst.sel(lon=300, lat=50).plot()
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
[20]:
[<matplotlib.lines.Line2D at 0x7fbd5c11c520>]
../_images/scipy-tutorial_03_computation_with_xarray_37_2.png

As we can see from the plot, the timeseries at any one point is totally dominated by the seasonal cycle. We would like to remove this seasonal cycle (called the “climatology”) in order to better see the long-term variaitions in temperature. We can accomplish this using groupby.

Before moving forward, w note that xarray correctly parsed the time index, resulting in a Pandas datetime index on the time dimension.

[21]:
ds.time
[21]:
<xarray.DataArray 'time' (time: 708)>
array(['1960-01-01T00:00:00.000000000', '1960-02-01T00:00:00.000000000',
       '1960-03-01T00:00:00.000000000', ..., '2018-10-01T00:00:00.000000000',
       '2018-11-01T00:00:00.000000000', '2018-12-01T00:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
Attributes:
    _ChunkSizes:      1
    actual_range:     [19723.0, 80505.0]
    avg_period:       0000-01-00 00:00:00
    axis:             T
    delta_t:          0000-01-00 00:00:00
    long_name:        Time
    prev_avg_period:  0000-00-07 00:00:00
    standard_name:    time

The syntax of Xarray’s groupby is almost identical to Pandas.

[22]:
ds.groupby?

The most important argument is group: this defines the unique values we will us to “split” the data for grouped analysis. We can pass either a DataArray or a name of a variable in the dataset. Lets first use a DataArray. Just like with Pandas, we can use the time indexe to extract specific components of dates and times. Xarray uses a special syntax for this .dt, called the DatetimeAccessor.

[23]:
ds.time.dt
[23]:
<xarray.core.accessor_dt.DatetimeAccessor at 0x7fbd4cc614f0>
[24]:
ds.time.dt.month
[24]:
<xarray.DataArray 'month' (time: 708)>
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,
        4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,
        9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,
        2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,
       12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,
        5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9,
       10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,
        3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,
        8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,
        1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,
        4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,
        9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,
        2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,
       12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,
...
        3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,
        8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,
        1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,
        4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,
        9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,
        2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,
       12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,
        5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9,
       10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,
        3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,
        8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,
        1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,
        6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,
        4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,
        9, 10, 11, 12,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,
        2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
Coordinates:
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01

ds.time.dt.year

We can use these arrays in a groupby operation:

[25]:
gb = ds.groupby(ds.time.dt.month)
gb
[25]:
DatasetGroupBy, grouped over 'month'
12 groups with labels 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

Xarray also offers a more concise syntax when the variable you’re grouping on is already present in the dataset. This is identical to the previous line:

[26]:
gb = ds.groupby("time.month")
gb
[26]:
DatasetGroupBy, grouped over 'month'
12 groups with labels 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

Now that the data are split, we can manually iterate over the group. The iterator returns the key (group name) and the value (the actual dataset corresponding to that group) for each group.

[27]:
for group_name, group_ds in gb:
    # stop iterating after the first loop
    break
print(group_name)
group_ds
1
[27]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 59)
Coordinates:
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * time     (time) datetime64[ns] 1960-01-01 1961-01-01 ... 2018-01-01
Data variables:
    sst      (time, lat, lon) float32 -1.8 -1.8 -1.8 -1.8 ... nan nan nan nan
Attributes:
    Conventions:                     CF-1.6, ACDD-1.3
    DODS_EXTRA.Unlimited_Dimension:  time
    References:                      https://www.ncdc.noaa.gov/data-access/ma...
    cdm_data_type:                   Grid
    citation:                        Huang et al, 2017: Extended Reconstructe...
    climatology:                     Climatology is based on 1971-2000 SST, X...
    comment:                         SSTs were observed by conventional therm...
    creator_name:                    Boyin Huang (original)
    creator_url_original:            https://www.ncei.noaa.gov
    data_modified:                   2020-07-07
    dataset_title:                   NOAA Extended Reconstructed SST V5
    date_created:                    2017-06-30T12:18:00Z (original)
    description:                     In situ data: ICOADS2.5 before 2007 and ...
    geospatial_lat_max:              89.0
    geospatial_lat_min:              -89.0
    geospatial_lat_units:            degrees_north
    geospatial_laty_max:             89.0
    geospatial_laty_min:             -89.0
    geospatial_lon_max:              359.0
    geospatial_lon_min:              -1.0
    geospatial_lon_units:            degrees_east
    history:                         created 07/2017 by PSD data using NCEI's...
    institution:                     This version written at NOAA/ESRL PSD: o...
    instrument:                      Conventional thermometers
    keywords:                        Earth Science > Oceans > Ocean Temperatu...
    keywords_vocabulary:             NASA Global Change Master Directory (GCM...
    license:                         No constraints on data access or use
    metadata_link:                   :metadata_link = https://doi.org/10.7289...
    original_publisher_url:          http://www.ncdc.noaa.gov
    platform:                        Ship and Buoy SSTs from ICOADS R3.0 and ...
    processing_level:                NOAA Level 4
    product_version:                 Version 5
    project:                         NOAA Extended Reconstructed Sea Surface ...
    source:                          In situ data: ICOADS R3.0 before 2015, N...
    source_comment:                  SSTs were observed by conventional therm...
    standard_name_vocabulary:        CF Standard Name Table (v40, 25 January ...
    summary:                         ERSST.v5 is developed based on v4 after ...
    title:                           NOAA ERSSTv5 (in situ only)

Now that we have groups defined, it’s time to “apply” a calculation to the group. Like in Pandas, these calculations can either be:

  • aggregation: reduces the size of the group

  • transformation: preserves the group’s full size

At then end of the apply step, xarray will automatically combine the aggregated / transformed groups back into a single object.

The most fundamental way to apply is with the .map method.

[28]:
gb.map?

Aggregations

.apply accepts as its argument a function. We can pass an existing function:

[29]:
gb.map(np.mean)
[29]:
<xarray.Dataset>
Dimensions:  (month: 12)
Coordinates:
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (month) float32 13.659467 13.768469 13.764702 ... 13.506328 13.5293

Because we specified no extra arguments (like axis) the function was applied over all space and time dimensions. This is not what we wanted. Instead, we could define a custom function. This function takes a single argument–the group dataset–and returns a new dataset to be combined:

[30]:
def time_mean(a):
    return a.mean(dim="time")


gb.map(time_mean)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[30]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, month: 12)
Coordinates:
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (month, lat, lon) float32 -1.8000009 -1.8000009 ... nan nan

Like Pandas, xarray’s groupby object has many built-in aggregation operations (e.g. mean, min, max, std, etc):

[31]:
# this does the same thing as the previous cell
ds_mm = gb.mean(dim="time")
ds_mm
[31]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, month: 12)
Coordinates:
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (month, lat, lon) float32 -1.8000009 -1.8000009 ... nan nan

So we did what we wanted to do: calculate the climatology at every point in the dataset. Let’s look at the data a bit.

Climatlogy at a specific point in the North Atlantic

[32]:
ds_mm.sst.sel(lon=300, lat=50).plot()
[32]:
[<matplotlib.lines.Line2D at 0x7fbd4cbde820>]
../_images/scipy-tutorial_03_computation_with_xarray_61_1.png

Zonal Mean Climatolgoy

[33]:
ds_mm.sst.mean(dim="lon").plot.contourf(x="month", levels=12, vmin=-2, vmax=30)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/plot/utils.py:624: MatplotlibDeprecationWarning: The 'extend' parameter to Colorbar has no effect because it is overridden by the mappable; it is deprecated since 3.3 and will be removed two minor releases later.
  cbar = fig.colorbar(primitive, **cbar_kwargs)
[33]:
<matplotlib.contour.QuadContourSet at 0x7fbd4cb3f3a0>
../_images/scipy-tutorial_03_computation_with_xarray_63_2.png

Difference between January and July Climatology

[34]:
(ds_mm.sst.sel(month=1) - ds_mm.sst.sel(month=7)).plot(vmax=10)
[34]:
<matplotlib.collections.QuadMesh at 0x7fbd4ca971f0>
../_images/scipy-tutorial_03_computation_with_xarray_65_1.png

Transformations

Now we want to remove this climatology from the dataset, to examine the residual, called the anomaly, which is the interesting part from a climate perspective. Removing the seasonal climatology is a perfect example of a transformation: it operates over a group, but doesn’t change the size of the dataset. Here is one way to code it

[35]:
def remove_time_mean(x):
    return x - x.mean(dim="time")


ds_anom = ds.groupby("time.month").map(remove_time_mean)
ds_anom
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[35]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 708)
Coordinates:
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
Data variables:
    sst      (time, lat, lon) float32 9.536743e-07 9.536743e-07 ... nan nan

Xarray makes these sorts of transformations easy by supporting groupby arithmetic. This concept is easiest explained with an example:

[36]:
gb = ds.groupby("time.month")
ds_anom = gb - gb.mean(dim="time")
ds_anom
[36]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 708)
Coordinates:
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
    month    (time) int64 1 2 3 4 5 6 7 8 9 10 11 ... 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (time, lat, lon) float32 9.536743e-07 9.536743e-07 ... nan nan

Now we can view the climate signal without the overwhelming influence of the seasonal cycle.

Timeseries at a single point in the North Atlantic

[37]:
ds_anom.sst.sel(lon=300, lat=50).plot()
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
[37]:
[<matplotlib.lines.Line2D at 0x7fbd4ca367c0>]
../_images/scipy-tutorial_03_computation_with_xarray_71_2.png

Difference between Jan. 1 2018 and Jan. 1 1960

[38]:
(ds_anom.sel(time="2018-01-01") - ds_anom.sel(time="1960-01-01")).sst.plot()
[38]:
<matplotlib.collections.QuadMesh at 0x7fbd4c9aecd0>
../_images/scipy-tutorial_03_computation_with_xarray_73_1.png

Grouby-Related: Resample, Rolling, Coarsen

Resample in xarray is nearly identical to Pandas. It is effectively a group-by operation, and uses the same basic syntax. It can be applied only to time-index dimensions. Here we compute the five-year mean.

[39]:
resample_obj = ds_anom.resample(time="5Y")
resample_obj
[39]:
DatasetResample, grouped over '__resample_dim__'
13 groups with labels 1960-12-31, ..., 2020-12-31.
[40]:
ds_anom_resample = resample_obj.mean(dim="time")
ds_anom_resample
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[40]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 13)
Coordinates:
  * time     (time) datetime64[ns] 1960-12-31 1965-12-31 ... 2020-12-31
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
Data variables:
    sst      (time, lat, lon) float32 -0.0006173253 -0.0006062587 ... nan nan
[41]:
ds_anom.sst.sel(lon=300, lat=50).plot()
ds_anom_resample.sst.sel(lon=300, lat=50).plot(marker="o")
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
[41]:
[<matplotlib.lines.Line2D at 0x7fbd4c894fd0>]
../_images/scipy-tutorial_03_computation_with_xarray_77_2.png

Note: resample only works with proper datetime indexes.

Rolling is also similar to pandas, but can be applied along any dimension. It works with logical coordinates.

[42]:
ds_anom_rolling = ds_anom.rolling(time=12, center=True).mean()
ds_anom_rolling
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[42]:
<xarray.Dataset>
Dimensions:  (lat: 89, lon: 180, time: 708)
Coordinates:
  * lon      (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 350.0 352.0 354.0 356.0 358.0
  * lat      (lat) float32 88.0 86.0 84.0 82.0 80.0 ... -82.0 -84.0 -86.0 -88.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
    month    (time) int64 1 2 3 4 5 6 7 8 9 10 11 ... 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
[43]:
ds_anom.sst.sel(lon=300, lat=50).plot(label="monthly anom")
ds_anom_resample.sst.sel(lon=300, lat=50).plot(
    marker="o", label="5 year resample"
)
ds_anom_rolling.sst.sel(lon=300, lat=50).plot(label="12 month rolling mean")
plt.legend()
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
[43]:
<matplotlib.legend.Legend at 0x7fbd4c7fc670>
../_images/scipy-tutorial_03_computation_with_xarray_80_2.png

coarsen does something similar to resample, but without being aware of time. It operates on logical coordinates only but can work on multiple dimensions at a time.

[44]:
ds_anom_coarsen_time = ds_anom.coarsen(time=12).mean()

ds_anom_rolling.sst.sel(lon=300, lat=50).plot(label="12 month rolling mean")
ds_anom_coarsen_time.sst.sel(lon=300, lat=50).plot(
    marker="^", label="12 item coarsen"
)
plt.legend()
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/pandas/plotting/_matplotlib/converter.py:256: MatplotlibDeprecationWarning:
The epoch2num function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
  base = dates.epoch2num(dt.asi8 / 1.0e9)
[44]:
<matplotlib.legend.Legend at 0x7fbd4c92b0d0>
../_images/scipy-tutorial_03_computation_with_xarray_82_2.png
[45]:
%%expect_exception
ds_anom_coarsen_space = ds_anom.coarsen(lon=4, lat=4).mean()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-f690a99950fa> in <module>
----> 1 ds_anom_coarsen_space = ds_anom.coarsen(lon=4, lat=4).mean()

~/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/rolling.py in wrapped_func(self, **kwargs)
    671             reduced = {}
    672             for key, da in self.obj.data_vars.items():
--> 673                 reduced[key] = da.variable.coarsen(
    674                     self.windows, func, self.boundary, self.side, **kwargs
    675                 )

~/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/variable.py in coarsen(self, windows, func, boundary, side, **kwargs)
   1940             return self.copy()
   1941
-> 1942         reshaped, axes = self._coarsen_reshape(windows, boundary, side)
   1943         if isinstance(func, str):
   1944             name = func

~/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/variable.py in _coarsen_reshape(self, windows, boundary, side)
   1973             if boundary[d] == "exact":
   1974                 if n * window != size:
-> 1975                     raise ValueError(
   1976                         "Could not coarsen a dimension of size {} with "
   1977                         "window {}".format(size, window)

ValueError: Could not coarsen a dimension of size 89 with window 4
[46]:
ds_anom_coarsen_space = (
    ds_anom.isel(lat=slice(0, -1)).coarsen(lon=4, lat=4).mean()
)
ds_anom_coarsen_space
/home/travis/miniconda/envs/xarray/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[46]:
<xarray.Dataset>
Dimensions:  (lat: 22, lon: 45, time: 708)
Coordinates:
  * lon      (lon) float32 3.0 11.0 19.0 27.0 35.0 ... 331.0 339.0 347.0 355.0
  * lat      (lat) float32 85.0 77.0 69.0 61.0 53.0 ... -59.0 -67.0 -75.0 -83.0
  * time     (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01
    month    (time) int64 1 2 3 4 5 6 7 8 9 10 11 ... 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sst      (time, lat, lon) float32 -0.00022904575 -0.022380486 ... nan nan
[47]:
ds_anom_coarsen_space.sst.isel(time=0).plot()
[47]:
<matplotlib.collections.QuadMesh at 0x7fbd4c6abd90>
../_images/scipy-tutorial_03_computation_with_xarray_85_1.png

Exercise

Load the following “basin mask” dataset, and use it to take a weighted average of SST in each ocean basin. Figure out which ocean basins are the warmest and coldest.

Hint: you will first need to align this dataset with the SST dataset. Use what you learned in the “indexing and alignment” lesson.

[48]:
basin = xr.open_dataset(
    "http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NODC/.WOA09/.Masks/.basin/dods"
)
basin
[48]:
<xarray.Dataset>
Dimensions:  (X: 360, Y: 180, Z: 33)
Coordinates:
  * Y        (Y) float32 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
  * Z        (Z) float32 0.0 10.0 20.0 30.0 50.0 ... 4000.0 4500.0 5000.0 5500.0
  * X        (X) float32 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
Data variables:
    basin    (Z, Y, X) float32 ...
Attributes:
    Conventions:  IRIDL