Quickstart Guide

Quickstart Guide#

Installation#

obsarray is installable via pip.

pip install obsarray

Dependencies#

obsarray is an extension to xarray to support defining, storing and interfacing with measurement data. It is designed to work well with netCDF files, using the netcdf4 library.

The pip installation will also automatically install any dependencies.

Example Usage#

First we build an example dataset that represents a time series of temperatures (for more on how do this see the xarray documentation).

In [1]: import numpy as np

In [2]: import xarray as xr

In [3]: import obsarray

# build an xarray to represents a time series of temperatures
In [4]: temps = np.array([20.2, 21.1, 20.8])

In [5]: times = np.array([0, 30, 60])

In [6]: ds = xr.Dataset(
   ...:    {"temperature": (["time"], temps, {"units": "degC"})},
   ...:    coords = {"time": (["time"], times, {"units": "s"})}
   ...: )
   ...: 

Uncertainty and error-covariance information for observation variables can be defined using the dataset’s unc accessor, which is provided by obsarray.

# add random component uncertainty
In [7]: ds.unc["temperature"]["u_r_temperature"] = (
   ...:    ["time"],
   ...:    np.array([0.5, 0.5, 0.6]),
   ...:    {"err_corr": [{"dim": "time", "form": "random"}]}
   ...: )
   ...: 

# add systematic component uncertainty
In [8]: ds.unc["temperature"]["u_s_temperature"] = (
   ...:    ["time"],
   ...:    np.array([0.3, 0.3, 0.3]),
   ...:    {"err_corr": [{"dim": "time", "form": "systematic"}]}
   ...: )
   ...: 

Dataset structures can be defined separately using obsarray’s templating functionality. This is helpful for processing chains where you want to write files to a defined format.

The defined uncertainty information then can be interfaced with, for example:

# get total combined uncertainty of all components
In [9]: ds.unc["temperature"].total_unc()
Out[9]: 
<xarray.DataArray (time: 3)> Size: 24B
array([0.58309519, 0.58309519, 0.67082039])
Coordinates:
  * time     (time) int64 24B 0 30 60

# get total error-covariance matrix for all components
In [10]: ds.unc["temperature"].total_err_cov_matrix()
Out[10]: 
<xarray.DataArray (time: 3)> Size: 72B
array([[0.34, 0.09, 0.09],
       [0.09, 0.34, 0.09],
       [0.09, 0.09, 0.45]])
Dimensions without coordinates: time, time

This information is preserved in metadata when written to netCDF files

Similarly, data flags can be defined using the dataset’s flag accessor, which again is provided by obsarray. These flags are defined following the CF Convention metadata standard.

A flag variable can be created to store data for a set of flags with defined meanings

In [11]: ds.flag["quality_flags"] = (
   ....:     ["time"],
   ....:     {"flag_meanings": ["dubious", "invalid", "saturated"]}
   ....: )
   ....: 

In [12]: print(ds.flag)
<FlagAccessor>
Dataset Flags:
* <FlagVariable>
FlagVariable: 'quality_flags'
['dubious', 'invalid', 'saturated']

These flag meanings can be indexed, to get and set their value

In [13]: print(ds.flag["quality_flags"]["dubious"].value)
<xarray.DataArray (time: 3)> Size: 3B
array([False, False, False])
Dimensions without coordinates: time

In [14]: ds.flag["quality_flags"]["dubious"][0] = True

In [15]: print(ds.flag["quality_flags"]["dubious"].value)
<xarray.DataArray (time: 3)> Size: 3B
array([ True, False, False])
Dimensions without coordinates: time