You can run this notebook in a live session Binder or view it on Github.

6b16b04c96014b9a96790c1131d8d5b0

Working with labeled data

Learing goals:

  • Use different forms of indexing to select data based on position and coordinates

  • Select datatime ranges

  • Interpolate data to new coordinates

Named dimensions

As mentioned in the previous session, labeled dimensions really help to make the code less difficult to understand. Compare pure numpy indexing:

[1]:
import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(0)
[2]:
# axis0: x, axis1: y
np_array = np.random.randn(3, 4)
np_array[1, 3]
[2]:
-0.1513572082976979

and slicing:

[3]:
np_array[:2, 1:]
[3]:
array([[ 0.40015721,  0.97873798,  2.2408932 ],
       [-0.97727788,  0.95008842, -0.15135721]])

with label based indexing:

[4]:
arr = xr.DataArray(np_array, dims=("x", "y"))
arr.isel(x=1, y=3)
[4]:
<xarray.DataArray ()>
array(-0.15135721)

This is the same as

[5]:
arr[{"x": 1, "y": 1}]
[5]:
<xarray.DataArray ()>
array(-0.97727788)

Due to the language syntax, slices have to be constructed manually:

[6]:
ds = xr.Dataset(
    {
        "a": (("x", "y"), np.random.randn(3, 4)),
        "b": (("x", "y"), np.random.randn(3, 4)),
    }
)
ds.isel(x=slice(None, 2), y=slice(1, None))
[6]:
<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Dimensions without coordinates: x, y
Data variables:
    a        (x, y) float64 0.1217 0.4439 0.3337 -0.2052 0.3131 -0.8541
    b        (x, y) float64 -1.454 0.04576 -0.1872 1.469 0.1549 0.3782

We can also use these names to peek at the data if the automatic preview is not enough:

[7]:
ds.head(x=2, y=3)
[7]:
<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Dimensions without coordinates: x, y
Data variables:
    a        (x, y) float64 0.761 0.1217 0.4439 1.494 -0.2052 0.3131
    b        (x, y) float64 2.27 -1.454 0.04576 1.533 1.469 0.1549

see also tail and thin.

Coordinate labels and label based indexing

xarray objects become much more interesting when adding coordinate labels:

[8]:
arr = xr.DataArray(
    np.random.randn(4, 6),
    dims=("x", "y"),
    coords={
        "x": [-3.2, 2.1, 5.3, 6.5],
        "y": pd.date_range("2009-01-05", periods=6, freq="M"),
    },
)
arr
[8]:
<xarray.DataArray (x: 4, y: 6)>
array([[ 1.23029068,  1.20237985, -0.38732682, -0.30230275, -1.04855297,
        -1.42001794],
       [-1.70627019,  1.9507754 , -0.50965218, -0.4380743 , -1.25279536,
         0.77749036],
       [-1.61389785, -0.21274028, -0.89546656,  0.3869025 , -0.51080514,
        -1.18063218],
       [-0.02818223,  0.42833187,  0.06651722,  0.3024719 , -0.63432209,
        -0.36274117]])
Coordinates:
  * x        (x) float64 -3.2 2.1 5.3 6.5
  * y        (y) datetime64[ns] 2009-01-31 2009-02-28 ... 2009-05-31 2009-06-30

To select data by coordinate labels instead of integer indices we can use the same syntax, using sel instead of isel:

[9]:
arr.sel(x=5.3, y="2009-04-30")  # or a.loc[{"x": 5.3, "y": "2009-04-30"}]
[9]:
<xarray.DataArray ()>
array(0.3869025)
Coordinates:
    x        float64 5.3
    y        datetime64[ns] 2009-04-30

this will require us to specify exact values. If we don’t have those, we can use the method parameter (see Dataset.sel for documentation):

[10]:
arr.sel(x=4, y="2009-04-01", method="nearest")
[10]:
<xarray.DataArray ()>
array(-0.89546656)
Coordinates:
    x        float64 5.3
    y        datetime64[ns] 2009-03-31

We can also select multiple values:

[11]:
arr.sel(x=[-3.2, 6.5], y=slice("2009-02-28", "2009-05-31"))
[11]:
<xarray.DataArray (x: 2, y: 4)>
array([[ 1.20237985, -0.38732682, -0.30230275, -1.04855297],
       [ 0.42833187,  0.06651722,  0.3024719 , -0.63432209]])
Coordinates:
  * x        (x) float64 -3.2 6.5
  * y        (y) datetime64[ns] 2009-02-28 2009-03-31 2009-04-30 2009-05-31

If instead of selecting data we want to drop it, we can use drop_sel:

[12]:
arr.drop_sel(x=[-3.2, 5.3])
[12]:
<xarray.DataArray (x: 2, y: 6)>
array([[-1.70627019,  1.9507754 , -0.50965218, -0.4380743 , -1.25279536,
         0.77749036],
       [-0.02818223,  0.42833187,  0.06651722,  0.3024719 , -0.63432209,
        -0.36274117]])
Coordinates:
  * x        (x) float64 2.1 6.5
  * y        (y) datetime64[ns] 2009-01-31 2009-02-28 ... 2009-05-31 2009-06-30

Exercises

[13]:
ds = xr.tutorial.open_dataset("air_temperature")
ds
[13]:
<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
  1. Select the first 30 entries of latitude and 20th to 40th entries of longitude

[14]:
# your code here
  1. Select all data at 75 degree north and between Jan 1, 2013 and Oct 15, 2013

[15]:
# your code here
  1. Remove all entries at 260 and 270 degrees

[16]:
# your code here

Interpolation

If we want to look at values between the current grid cells (interpolation), we can do that with interp (requires scipy):

[17]:
arr.interp(
    x=np.linspace(2, 6, 10),
    y=pd.date_range("2009-04-01", "2009-04-30", freq="D"),
)
[17]:
<xarray.DataArray (x: 10, y: 30)>
array([[-0.50494977, -0.50255538, -0.500161  , -0.49776661, -0.49537223,
        -0.49297784, -0.49058345, -0.48818907, -0.48579468, -0.4834003 ,
        -0.48100591, -0.47861152, -0.47621714, -0.47382275, -0.47142837,
        -0.46903398, -0.46663959, -0.46424521, -0.46185082, -0.45945643,
        -0.45706205, -0.45466766, -0.45227328, -0.44987889, -0.4474845 ,
        -0.44509012, -0.44269573, -0.44030135, -0.43790696, -0.43551257],
       [-0.54445061, -0.53772041, -0.5309902 , -0.52426   , -0.5175298 ,
        -0.51079959, -0.50406939, -0.49733919, -0.49060898, -0.48387878,
        -0.47714858, -0.47041837, -0.46368817, -0.45695797, -0.45022776,
        -0.44349756, -0.43676736, -0.43003715, -0.42330695, -0.41657675,
        -0.40984654, -0.40311634, -0.39638614, -0.38965594, -0.38292573,
        -0.37619553, -0.36946533, -0.36273512, -0.35600492, -0.34927472],
       [-0.59243043, -0.58009471, -0.56775899, -0.55542327, -0.54308755,
        -0.53075184, -0.51841612, -0.5060804 , -0.49374468, -0.48140896,
        -0.46907325, -0.45673753, -0.44440181, -0.43206609, -0.41973037,
        -0.40739466, -0.39505894, -0.38272322, -0.3703875 , -0.35805178,
        -0.34571607, -0.33338035, -0.32104463, -0.30870891, -0.29637319,
        -0.28403748, -0.27170176, -0.25936604, -0.24703032, -0.2346946 ],
       [-0.64041024, -0.62246901, -0.60452778, -0.58658654, -0.56864531,
        -0.55070408, -0.53276285, -0.51482161, -0.49688038, -0.47893915,
...
        -0.08919415, -0.05443638, -0.0196786 ,  0.01507918,  0.04983696,
         0.08459473,  0.11935251,  0.15411029,  0.18886806,  0.22362584],
       [-0.8323295 , -0.79196621, -0.75160292, -0.71123963, -0.67087634,
        -0.63051305, -0.59014975, -0.54978646, -0.50942317, -0.46905988,
        -0.42869659, -0.3883333 , -0.34797001, -0.30760671, -0.26724342,
        -0.22688013, -0.18651684, -0.14615355, -0.10579026, -0.06542696,
        -0.02506367,  0.01529962,  0.05566291,  0.0960262 ,  0.13638949,
         0.17675278,  0.21711608,  0.25747937,  0.29784266,  0.33820595],
       [-0.65528226, -0.61996487, -0.58464749, -0.5493301 , -0.51401272,
        -0.47869533, -0.44337795, -0.40806056, -0.37274318, -0.33742579,
        -0.30210841, -0.26679102, -0.23147364, -0.19615625, -0.16083887,
        -0.12552148, -0.0902041 , -0.05488671, -0.01956933,  0.01574806,
         0.05106544,  0.08638283,  0.12170021,  0.1570176 ,  0.19233498,
         0.22765237,  0.26296975,  0.29828714,  0.33360452,  0.36892191],
       [-0.31191067, -0.28951198, -0.26711329, -0.2447146 , -0.22231591,
        -0.19991722, -0.17751853, -0.15511984, -0.13272115, -0.11032246,
        -0.08792378, -0.06552509, -0.0431264 , -0.02072771,  0.00167098,
         0.02406967,  0.04646836,  0.06886705,  0.09126574,  0.11366442,
         0.13606311,  0.1584618 ,  0.18086049,  0.20325918,  0.22565787,
         0.24805656,  0.27045525,  0.29285394,  0.31525263,  0.33765131]])
Coordinates:
  * x        (x) float64 2.0 2.444 2.889 3.333 3.778 4.222 4.667 5.111 5.556 6.0
  * y        (y) datetime64[ns] 2009-04-01 2009-04-02 ... 2009-04-29 2009-04-30

when trying to extrapolate, the resulting values will be nan.

If we already have a object with the desired coordinates, we can use interp_like:

[18]:
other = xr.DataArray(
    dims=("x", "y"),
    coords={
        "x": np.linspace(2, 4, 10),
        "y": pd.date_range("2009-04-01", "2009-04-30", freq="D"),
    },
)
arr.interp_like(other)
[18]:
<xarray.DataArray (x: 10, y: 30)>
array([[-0.50494977, -0.50255538, -0.500161  , -0.49776661, -0.49537223,
        -0.49297784, -0.49058345, -0.48818907, -0.48579468, -0.4834003 ,
        -0.48100591, -0.47861152, -0.47621714, -0.47382275, -0.47142837,
        -0.46903398, -0.46663959, -0.46424521, -0.46185082, -0.45945643,
        -0.45706205, -0.45466766, -0.45227328, -0.44987889, -0.4474845 ,
        -0.44509012, -0.44269573, -0.44030135, -0.43790696, -0.43551257],
       [-0.5204607 , -0.51653326, -0.51260581, -0.50867836, -0.50475092,
        -0.50082347, -0.49689603, -0.49296858, -0.48904113, -0.48511369,
        -0.48118624, -0.4772588 , -0.47333135, -0.46940391, -0.46547646,
        -0.46154901, -0.45762157, -0.45369412, -0.44976668, -0.44583923,
        -0.44191178, -0.43798434, -0.43405689, -0.43012945, -0.426202  ,
        -0.42227455, -0.41834711, -0.41441966, -0.41049222, -0.40656477],
       [-0.54445061, -0.53772041, -0.5309902 , -0.52426   , -0.5175298 ,
        -0.51079959, -0.50406939, -0.49733919, -0.49060898, -0.48387878,
        -0.47714858, -0.47041837, -0.46368817, -0.45695797, -0.45022776,
        -0.44349756, -0.43676736, -0.43003715, -0.42330695, -0.41657675,
        -0.40984654, -0.40311634, -0.39638614, -0.38965594, -0.38292573,
        -0.37619553, -0.36946533, -0.36273512, -0.35600492, -0.34927472],
       [-0.56844052, -0.55890756, -0.5493746 , -0.53984164, -0.53030868,
        -0.52077571, -0.51124275, -0.50170979, -0.49217683, -0.48264387,
...
        -0.28158559, -0.26364435, -0.24570312, -0.22776189, -0.20982066,
        -0.19187942, -0.17393819, -0.15599696, -0.13805573, -0.12011449],
       [-0.66440015, -0.64365616, -0.62291217, -0.60216818, -0.58142419,
        -0.5606802 , -0.53993621, -0.51919222, -0.49844823, -0.47770424,
        -0.45696025, -0.43621626, -0.41547227, -0.39472828, -0.37398429,
        -0.3532403 , -0.33249631, -0.31175232, -0.29100833, -0.27026434,
        -0.24952035, -0.22877636, -0.20803237, -0.18728838, -0.16654439,
        -0.1458004 , -0.12505641, -0.10431242, -0.08356843, -0.06282444],
       [-0.68839006, -0.66484331, -0.64129656, -0.61774981, -0.59420307,
        -0.57065632, -0.54710957, -0.52356283, -0.50001608, -0.47646933,
        -0.45292258, -0.42937584, -0.40582909, -0.38228234, -0.35873559,
        -0.33518885, -0.3116421 , -0.28809535, -0.2645486 , -0.24100186,
        -0.21745511, -0.19390836, -0.17036161, -0.14681487, -0.12326812,
        -0.09972137, -0.07617462, -0.05262788, -0.02908113, -0.00553438],
       [-0.71237996, -0.68603046, -0.65968096, -0.63333145, -0.60698195,
        -0.58063244, -0.55428294, -0.52793343, -0.50158393, -0.47523442,
        -0.44888492, -0.42253541, -0.39618591, -0.3698364 , -0.3434869 ,
        -0.31713739, -0.29078789, -0.26443838, -0.23808888, -0.21173937,
        -0.18538987, -0.15904036, -0.13269086, -0.10634136, -0.07999185,
        -0.05364235, -0.02729284, -0.00094334,  0.02540617,  0.05175567]])
Coordinates:
  * x        (x) float64 2.0 2.222 2.444 2.667 2.889 3.111 3.333 3.556 3.778 4.0
  * y        (y) datetime64[ns] 2009-04-01 2009-04-02 ... 2009-04-29 2009-04-30

Exercises

Increase the step size along latitude and longitude from 2.5 degrees to 1 degree.

[19]:
# your code here

Broadcasting and automatic alignment

Labels help with combining arrays with different coordinates:

[20]:
a = xr.DataArray(
    np.random.randn(3, 4),
    dims=("x", "y"),
    coords={"x": ["a", "b", "c"], "y": np.arange(4)},
)
b = xr.DataArray(
    np.random.randn(2, 7),
    dims=("x", "y"),
    coords={"x": ["b", "d"], "y": [-2, -1, 0, 1, 2, 3, 4]},
)

a + b
[20]:
<xarray.DataArray (x: 1, y: 4)>
array([[ 0.57976778, -1.08659103, -2.5009955 , -0.11606741]])
Coordinates:
  * x        (x) object 'b'
  * y        (y) int64 0 1 2 3

This will automatically select only common labels from both arrays (a inner join) and then perform the operation.

Broadcasting works similar:

[21]:
arr1 = xr.DataArray(
    np.random.randn(3), dims="x", coords={"x": ["a", "b", "c"]},
)
arr2 = xr.DataArray(np.random.randn(4), dims="y", coords={"y": np.arange(4)},)

arr1 + arr2
[21]:
<xarray.DataArray (x: 3, y: 4)>
array([[ 2.2332313 ,  0.77560262,  2.40122464,  1.38705455],
       [ 0.87452689, -0.58310178,  1.04252023,  0.02835014],
       [-0.01630089, -1.47392957,  0.15169245, -0.86247764]])
Coordinates:
  * x        (x) <U1 'a' 'b' 'c'
  * y        (y) int64 0 1 2 3

where both arrays were automatically broadcasted against each other:

[22]:
arr1_, arr2_ = xr.broadcast(arr1, arr2)
[23]:
arr1_
[23]:
<xarray.DataArray (x: 3, y: 4)>
array([[ 1.17877957,  1.17877957,  1.17877957,  1.17877957],
       [-0.17992484, -0.17992484, -0.17992484, -0.17992484],
       [-1.07075262, -1.07075262, -1.07075262, -1.07075262]])
Coordinates:
  * x        (x) <U1 'a' 'b' 'c'
  * y        (y) int64 0 1 2 3
[24]:
arr2_
[24]:
<xarray.DataArray (x: 3, y: 4)>
array([[ 1.05445173, -0.40317695,  1.22244507,  0.20827498],
       [ 1.05445173, -0.40317695,  1.22244507,  0.20827498],
       [ 1.05445173, -0.40317695,  1.22244507,  0.20827498]])
Coordinates:
  * y        (y) int64 0 1 2 3
  * x        (x) <U1 'a' 'b' 'c'

and then the operation (a sum) was executed.

We can also call align speciically with different options.

[25]:
a_al, b_al = xr.align(a, b, join="inner")
b_al
[25]:
<xarray.DataArray (x: 1, y: 4)>
array([[ 0.40234164, -0.68481009, -0.87079715, -0.57884966]])
Coordinates:
  * x        (x) object 'b'
  * y        (y) int64 0 1 2 3