Workflow Example#

Here we illustrate a complete workflow example including the following steps:

  • Data loading

  • Converting the data to CF format

  • Preprocessing the data

  • Running a diagnostic

  • Visualizing the results

Imports#

[1]:
from pathlib import Path

import xarray as xr
from datatree import DataTree

import valenspy as vp #The Valenspy package
from valenspy.inputconverter_functions import _non_convertor

Input Convertors#

Input convertors are used to convert the data to CF format. There main component is a function that takes the file and returns the data CF convention. See input_convertors_functions.py for examples.

The Input convertor is a class that does the following:

  • Convert the data

  • Check if the converted data meets the CF convention

[2]:
#Import Converter - This input converter will not do anything to the data.
ic = vp.InputConverter(_non_convertor)

Loading datasets#

Load the data and convert to CF format if necessary.

In this illustration we will load EOBS data, CMIP6 historical and future data

[3]:
#Observational dataset
EOBS_data_dir = Path("/dodrio/scratch/projects/2022_200/project_input/External/observations/EOBS/0.1deg/")

EOBS_obs_files = list(EOBS_data_dir.glob("*tg*mean*.nc")) #Select all the netCDF files in the directory
EOBS_obs_files = ic.convert_input(EOBS_obs_files) #Convert the input to the correct format

EOBS_ds = xr.open_mfdataset(EOBS_obs_files, combine='by_coords', chunks='auto')

#THIS SHOULD ALL BE IN THE INPUT CONVERTER FOR THE DATASET
EOBS_ds = EOBS_ds.rename_vars({"tg": "tas"})
EOBS_ds = EOBS_ds.rename({"latitude": "lat", "longitude": "lon"})
#Convert from Celsius to Kelvin
EOBS_ds["tas"] = EOBS_ds["tas"] + 273.15

#Select a small region covering Belgium for testing
EOBS_ds = EOBS_ds.sel(time=slice("1953-01-01", "1954-12-31"))
EOBS_ds = EOBS_ds.sel(lat=slice(49, 51), lon=slice(3, 5))

EOBS_ds
[3]:
<xarray.Dataset> Size: 1MB
Dimensions:  (lat: 20, lon: 20, time: 730)
Coordinates:
  * lat      (lat) float64 160B 49.05 49.15 49.25 49.35 ... 50.75 50.85 50.95
  * lon      (lon) float64 160B 3.05 3.15 3.25 3.35 3.45 ... 4.65 4.75 4.85 4.95
  * time     (time) datetime64[ns] 6kB 1953-01-01 1953-01-02 ... 1954-12-31
Data variables:
    tas      (time, lat, lon) float32 1MB dask.array<chunksize=(26, 20, 20), meta=np.ndarray>
Attributes:
    E-OBS_version:  29.0e
    Conventions:    CF-1.4
    References:     http://surfobs.climate.copernicus.eu/dataaccess/access_eo...
    history:        Fri Mar 22 09:55:59 2024: ncks --no-abc -d time,0,27027 /...
    NCO:            netCDF Operators version 5.1.4 (Homepage = http://nco.sf....
[4]:
#CMIP6 dataset
#Now make an ensemble member object
EC_Earth3_dir = Path("/dodrio/scratch/projects/2022_200/project_output/RMIB-UGent/vsc46032_kobe/ValEnsPy/tests/data")
EC_Earth3_hist_files = list(EC_Earth3_dir.glob("*historical*.nc")) #Select all the netCDF files in the directory
EC_Earth3_hist_files = ic.convert_input(EC_Earth3_hist_files) #Convert the input to the correct format
EC_Earth3_ds = xr.open_mfdataset(EC_Earth3_hist_files, combine='by_coords', chunks='auto')
EC_Earth3_ds
[4]:
<xarray.Dataset> Size: 13MB
Dimensions:    (time: 24, bnds: 2, lat: 256, lon: 512)
Coordinates:
  * time       (time) datetime64[ns] 192B 1953-01-16T12:00:00 ... 1954-12-16T...
  * lat        (lat) float64 2kB -89.46 -88.77 -88.07 ... 88.07 88.77 89.46
  * lon        (lon) float64 4kB 0.0 0.7031 1.406 2.109 ... 357.9 358.6 359.3
    height     float64 8B 2.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 384B dask.array<chunksize=(12, 2), meta=np.ndarray>
    lat_bnds   (time, lat, bnds) float64 98kB dask.array<chunksize=(12, 256, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 197kB dask.array<chunksize=(12, 512, 2), meta=np.ndarray>
    tas        (time, lat, lon) float32 13MB dask.array<chunksize=(12, 256, 512), meta=np.ndarray>
Attributes: (12/46)
    Conventions:                        CF-1.7 CMIP-6.2
    activity_id:                        CMIP
    branch_method:                      standard
    branch_time_in_child:               0.0
    branch_time_in_parent:              29219.0
    contact:                            cmip6-data@ec-earth.org
    ...                                 ...
    variant_label:                      r1i1p1f1
    license:                            CMIP6 model data produced by EC-Earth...
    cmor_version:                       3.4.0
    tracking_id:                        hdl:21.14100/18af2970-6a17-45fe-b629-...
    history:                            2019-06-06T07:27:13Z ; CMOR rewrote d...
    latest_applied_cmor_fixer_version:  v3.0
[5]:
#Now make an "future" ensemble member object
EC_Earth3_ssp_files = list(EC_Earth3_dir.glob("*ssp245*.nc")) #Select all the netCDF files in the directory
EC_Earth3_ssp_files = ic.convert_input(EC_Earth3_ssp_files) #Convert the input to the correct format
EC_Earth3_ssp_ds = xr.open_mfdataset(EC_Earth3_ssp_files, combine='by_coords', chunks='auto')
EC_Earth3_ssp_ds
[5]:
<xarray.Dataset> Size: 13MB
Dimensions:    (time: 24, bnds: 2, lat: 256, lon: 512)
Coordinates:
  * time       (time) datetime64[ns] 192B 2015-01-16T12:00:00 ... 2016-12-16T...
  * lat        (lat) float64 2kB -89.46 -88.77 -88.07 ... 88.07 88.77 89.46
  * lon        (lon) float64 4kB 0.0 0.7031 1.406 2.109 ... 357.9 358.6 359.3
    height     float64 8B 2.0
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 384B dask.array<chunksize=(12, 2), meta=np.ndarray>
    lat_bnds   (time, lat, bnds) float64 98kB dask.array<chunksize=(12, 256, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 197kB dask.array<chunksize=(12, 512, 2), meta=np.ndarray>
    tas        (time, lat, lon) float32 13MB dask.array<chunksize=(12, 256, 512), meta=np.ndarray>
Attributes: (12/45)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            ScenarioMIP
    branch_method:          standard
    branch_time_in_child:   60265.0
    branch_time_in_parent:  60265.0
    contact:                cmip6-data@ec-earth.org
    ...                     ...
    variable_id:            tas
    variant_label:          r1i1p1f1
    license:                CMIP6 model data produced by EC-Earth-Consortium ...
    cmor_version:           3.4.0
    tracking_id:            hdl:21.14100/697b3a82-4ffc-49ce-b070-2ca86ce8a06f
    history:                2019-06-29T08:25:09Z ; CMOR rewrote data to be co...

DataTree#

The DataTree structures the data. This will be very useful for the preprocessing and the diagnostics.

[6]:
dt = DataTree.from_dict({"obs/EOBS": EOBS_ds, "ensembles/cmip6/EC_Earth3/hist": EC_Earth3_ds, "ensembles/cmip6/EC_Earth3/ssp245": EC_Earth3_ssp_ds})
dt
[6]:
<xarray.DatasetView> Size: 0B
Dimensions:  ()
Data variables:
    *empty*

Preprocessing#

Define and add preprocessing steps to a preprocessor. These can then be applied to the whole or part of the DataTree.

Here we define a regridding step and select a region.

[7]:
#Apply some postprocessing operations on the datatree
pp = vp.Preprocessor()
pp.add_preprocessing_task(vp.preprocessing_tasks.Regrid(dt.obs.EOBS.ds, name="to_obs", description="Regrid the CMIP6 data to the EOBS grid"))
pp.add_preprocessing_task(vp.preprocessing_tasks.Set_Domain(dt.obs.EOBS.ds, name="to_obs", description="Select the common area of the EOBS grid"))

Applying the preprocessing tasks#

The preprocessing tasks can be applied on all models in an ensemble in the datatree. In this example we will apply the regridding and region selection to the CMIP6 data (EC-Earth3 - historical and ssp245).

[8]:
dt.ensembles = pp.apply_preprocessing(dt.ensembles)

dt
[8]:
<xarray.DatasetView> Size: 0B
Dimensions:  ()
Data variables:
    *empty*

Diagnostics#

Finally we will make a diagnostic (validation/evaluation) which takes the processed data and compares them. There are different types of comparisons, here we will use a Model2Ref comparison which compares the model data to the reference dataset.

Diagnostic object#

The diagnostic object consists of the following components:

  • A name

  • A description

  • A function that takes the processed data and returns the results

  • A function that takes the results and returns a plot

Common standard diagnostics are (will be) predefined in package. The users can define there own diagnostic functions and visualizations functions thus creating there own diagnostics.

[9]:
#Apply a diagnostic operation
#First make a diagnostic object
from valenspy.diagnostic_functions import spatial_bias
from valenspy.diagnostic_visualizations import plot_spatial_bias

diag = vp.Model2Ref(spatial_bias, plot_spatial_bias, name="spatial_bias", description="Calculate the time averaged spatial bias between the model and the observations")

#Apply the
#Expects tas
def warming_levels(ds, ref, levels=[1.5, 2.0], rol_years=21):
    #Calculate the warming levels
    ref_temp = ref.tas.mean()

def calc_crossing_time(xr, wl, base, rol_years=21):
    try:
        rol = xr.rolling(time=rol_years*12).mean().mean(dim=['lat','lon'])
        rol = rol - base
        days = int(365.24*(rol_years-1)/2)
        return rol.where(rol>wl, drop=True).idxmin('time').astype('datetime64[ns]').values - np.timedelta64(days,'D')
    except ValueError:
        return False


ds = diag.apply(dt.ensembles.cmip6.EC_Earth3.hist, dt.obs.EOBS)
ds
[9]:
<xarray.DataArray 'tas' (lat: 20, lon: 20)> Size: 2kB
dask.array<sub, shape=(20, 20), dtype=float32, chunksize=(20, 20), chunktype=numpy.ndarray>
Coordinates:
    height   float64 8B 2.0
  * lon      (lon) float64 160B 3.05 3.15 3.25 3.35 3.45 ... 4.65 4.75 4.85 4.95
  * lat      (lat) float64 160B 49.05 49.15 49.25 49.35 ... 50.75 50.85 50.95

As we are using dask the resulting data needs to be computed. Note that when plotting, the data is automatically computed.

[10]:
from dask.diagnostics import ProgressBar
with ProgressBar():
    ds.compute()
[########################################] | 100% Completed | 13.04 ss

Plotting#

Finaly the diagnostic has some built in plotting functionality to visualize the results. The user can tweek the visualization or make his own starting from the resulting ds.

[11]:
f = diag.visualize(dt.ensembles.cmip6.EC_Earth3.hist, dt.obs.EOBS)
../_images/doc_examples_workflow_example_19_0.png
[12]:
f = diag.visualize(dt.ensembles.cmip6.EC_Earth3.ssp245, dt.obs.EOBS)
../_images/doc_examples_workflow_example_20_0.png