input.manager#

Defines the InputManager class for finding, managing, preprocessing and loading input data for ValEnsPy.

Classes

InputManager(machine, datasets_info[, ...])

A class to find, manage, preprocess and load input data for ValEnsPy.

class InputManager(machine: str, datasets_info: dict = None, description=None, input_convertors: dict = {'ALARO_K': <valenspy.input.converter.InputConverter object>, 'CCLM': <valenspy.input.converter.InputConverter object>, 'CLIMATE_GRID': <valenspy.input.converter.InputConverter object>, 'EOBS': <valenspy.input.converter.InputConverter object>, 'ERA5': <valenspy.input.converter.InputConverter object>, 'ERA5-Land': <valenspy.input.converter.InputConverter object>, 'MAR': <valenspy.input.converter.InputConverter object>, 'RADCLIM': <valenspy.input.converter.InputConverter object>}, esmcat_data: dict = {'aggregation_control': {'aggregations': [{'attribute_name': 'time_period_start', 'options': {'dim': 'time'}, 'type': 'join_existing'}, {'attribute_name': 'variable_id', 'type': 'union'}], 'groupby_attrs': ['source_id', 'source_type', 'domain_id', 'experiment_id', 'version', 'resolution', 'frequency', 'driving_source_id', 'institution_id', 'realization', 'post_processing'], 'variable_column_name': 'variable_id'}, 'assets': {'column_name': 'path', 'format': 'netcdf'}, 'attributes': [], 'esmcat_version': '0.1.0', 'id': 'test'}, xarray_open_kwargs: dict = {}, xarray_combine_by_coords_kwargs: dict = {}, intake_esm_kwargs: dict = {'sep': '/'})[source]#

A class to find, manage, preprocess and load input data for ValEnsPy.

The InputManager class consists of an ValEnsPy specific intake-esm catalog (ValenspyEsmDatastore) and a CatalogBuilder. The Catalog Builder is used to create the catalog, a df with dataset information per file, using minimal information about the datasets and their path structure. This catalog is then used to create an esm_datastore (ValenspyEsmDatastore) which can be used to search and load the datasets. The InputManager class provides a preprocessing function based on the input convertors to convert the datasets to ValEnsPy

_update_catalog(dataset_name, dataset_info_dict)[source]#

Update the catalog (df with dataset information per file) with a new dataset.

The catalog_builder is used to parse the dataset information and update the catalog.

Parameters:: dataset_info_dict (dict) – A dictionary containing dataset information. The keys are dataset names and the values are dictionaries with the following keys: - root: The root directory of the dataset. - pattern: The regex pattern for matching files in the dataset. - meta_data: A dictionary containing metadata for the dataset.

add_input_convertor(dataset_name, input_convertor)[source]#: Add an input convertor to the InputManager.

property intake_to_xarray_kwargs#

Easy access of kwargs to be used passed that can be passed to intake_esm.esm_datastore.to_dataset_dict, intake_esm.esm_datastore.to_dask and/or intake_esm.esm_datastore.to_datatree

Three types of kwargs are created: - preprocess: The preprocessor function which applies the input convertor to the dataset if an input convertor exists (i.e. source_id is in INPUT_CONVERTORS). - xarray_open_kwargs: The kwargs to be passed to xarray.open_dataset. - xarray_combine_by_coords_kwargs: The kwargs to be passed to xarray.combine_by_coords.

property preprocess#

A preprocessor function to convert the input dataset to ValEnsPy compliant data.

This function applys the input convertor to the dataset if an input convertor exists (i.e. source_id is in this managers input convertors).

property skipped_files#: The files that where skipped during the catalog creation.

update_catalog_from_dataset_info(dataset_name, dataset_root_dir, dataset_pattern, metadata={})[source]#

Update the catalog with a new dataset.

For the dataset, parse the dataset information, validate it, add it to the catalog and update the esm_datastore.

Parameters:

dataset_name (str) – The name of the dataset.
dataset_root_dir (str) – The root directory of the dataset.
dataset_pattern (str) – The regex pattern for matching files in the dataset. This is the reletave path starting from the root and in the following format: <indentifier_name>/<indentifier_name>/<indentifier_name>_fixed_part_<variable_id>/<another_identifier>_<year>.nc
metadata (dict, optional) – Additional metadata to include in the catalog. Default is an empty dictionary.

update_catalog_from_yaml(yaml_path)[source]#

Update the catalog from a YAML file.

For each dataset, parse the dataset information, validate it, add it to the catalog and update the esm_datastore.

Parameters:

yaml_path (Path) –

The path to the YAML file containing datasets information a dictionary of dataset names and their dataset information. The datasetinfo should contain the following keys: - root: The root directory of the dataset. - pattern: The regex pattern for matching files in the dataset. This is the reletave path starting from the root and in the following format:

<indentifier_name>/<indentifier_name>/<indentifier_name>_fixed_part_<variable_id>/<another_identifier>_<year>.nc

meta_data: A dictionary containing metadata for the dataset.