input.manager#

Defines the InputManager class for loading and managing input data for ValEnsPy.

Classes

InputManager(machine)

class InputManager(machine)[source]#
_get_file_paths(dataset_name, variables=['tas'], period=None, freq=None, region=None, path_identifiers=[])[source]#

Get the file paths for the specified dataset, variables, period and frequency.

_is_valid_dataset_name(dataset_name)[source]#

Check if the dataset name is valid for the machine.

load_data(dataset_name, variables=['tas'], period=None, freq=None, region=None, cf_convert=True, path_identifiers=[], metadata_info={})[source]#

Load the data for the specified dataset, variables, period and frequency and transform it into ValEnsPy CF-Compliant format.

For files to be found and loaded they should be in a subdirectory of the dataset path and contain the raw_long_name or raw_name or CORDEX variable name, the year (optional), frequency and path_identifiers (optional) in the file name.

A regex search is used to match any netcdf (.nc) file paths that start with the dataset_path from the dataset_PATHS.yml and contains: 1) The raw_long_name of the CORDEX variables given the dataset_name_lookup.yml 2) Any YYYY string within the period 3) The frequency of the data (daily, monthly, yearly) 4) Any additional path_identifiers

The order of these components is irrelevant. The dataset is then loaded using xarray.open_mfdataset and if cf_convert is True, the data is converted to CF-Compliant format using the appropriate input converter. If no period is specified, all files matching the other components are loaded.

Parameters:
  • dataset_name (str) – The name of the dataset to load. This should be in the dataset_PATHS.yml file for the specified machine.

  • variables (list, optional) – The variables to load. The default is [“tas”]. These should be CORDEX variables defined in CORDEX_variables.yml.

  • period (list or an int, optional) – The period to load. If a list, the start and end years of the period. For a single year both an int and a list with one element are valid. The default is None.

  • freq (str, optional) – The frequency of the data. The default is None.

  • region (str, optional) – The region to load. The default is None.

  • cf_convert (bool, optional) – Whether to convert the data to CF-Compliant format. The default is True.

  • path_identifiers (list, optional) – Other identifiers to match in the file paths. These are on top the variable long name, year and frequency. The default is [].

  • other_metadata_info (dict, optional) – Other metadata information to pass to the input converter. The default is {}.

Returns:

ds – The loaded dataset in CF-Compliant format.

Return type:

xarray.Dataset

Raises:
  • FileNotFoundError – If no files are found for the specified dataset, variables, period, frequency and path_identifiers.

  • ValueError – If the dataset name is not valid for the machine. i.e. not in the dataset_PATHS.yml file.

Examples

>>> manager = InputManager(machine='hortense')
>>> # Get all ERA5 tas (temperature at 2m) at a daily frequency for the years 2000 and 2001. The paths must include "max".
>>> ds = manager.load_data("ERA5", variables=["tas"], period=[2000,2001], path_identifiers=["max"])
load_m_data(datasets_dict, variables=['tas'], cf_convert=True, metadata_info={})[source]#

Load multiple datasets and variables and return a DataTree object.

Each dataset is passed to the load_data method and the resulting datasets are combined into a DataTree object.

Parameters:
  • datasets_dict (dict) – A dictionary of datasets to load. The keys are the dataset names and the values are dictionaries containing the period, frequency, region and path_identifiers as keys.

  • variables (list) – The variables to load. The default is [“tas”]. These should be CORDEX variables defined in CORDEX_variables.yml.

  • cf_convert (bool, optional) – Whether to convert the data to CF-Compliant format. The default is True.

  • metadata_info (dict, optional) – Other metadata information to pass to the input converter. The default is {}.

Returns:

A DataTree object containing the loaded datasets.

Return type:

DataTree

Examples

>>> manager = InputManager(machine='hortense')
>>> # Get all ERA5 tas (temperature at 2m) at a daily frequency for the years 2000 and 2001. The paths must include "max".
>>> data_request_dict={
    "EOBS":
        {"path_identifiers":["mean"]},
    "ERA5":
        {"period":[2000,2001],
        "freq":"daily",
        "region":"europe",
        "path_identifiers":["min"]}
    }
>>> dt = manager.load_m_data(data_request_dict, variables=["tas","pr"])