input.manager#
Defines the InputManager class for finding, managing, preprocessing and loading input data for ValEnsPy.
Classes
|
A class to find, manage, preprocess and load input data for ValEnsPy. |
- class InputManager(machine: str, datasets_info: dict = None, description=None, input_convertors: dict = {'ALARO_K': <valenspy.input.converter.InputConverter object>, 'CCLM': <valenspy.input.converter.InputConverter object>, 'CLIMATE_GRID': <valenspy.input.converter.InputConverter object>, 'EOBS': <valenspy.input.converter.InputConverter object>, 'ERA5': <valenspy.input.converter.InputConverter object>, 'ERA5-Land': <valenspy.input.converter.InputConverter object>, 'MAR': <valenspy.input.converter.InputConverter object>, 'RADCLIM': <valenspy.input.converter.InputConverter object>}, esmcat_data: dict = {'aggregation_control': {'aggregations': [{'attribute_name': 'time_period_start', 'options': {'dim': 'time'}, 'type': 'join_existing'}, {'attribute_name': 'variable_id', 'type': 'union'}], 'groupby_attrs': ['source_id', 'source_type', 'domain_id', 'experiment_id', 'version', 'resolution', 'frequency', 'driving_source_id', 'institution_id', 'realization', 'post_processing'], 'variable_column_name': 'variable_id'}, 'assets': {'column_name': 'path', 'format': 'netcdf'}, 'attributes': [], 'esmcat_version': '0.1.0', 'id': 'test'}, xarray_open_kwargs: dict = {}, xarray_combine_by_coords_kwargs: dict = {}, intake_esm_kwargs: dict = {'sep': '/'})[source]#
A class to find, manage, preprocess and load input data for ValEnsPy.
The InputManager class consists of an ValEnsPy specific intake-esm catalog (ValenspyEsmDatastore) and a CatalogBuilder. The Catalog Builder is used to create the catalog, a df with dataset information per file, using minimal information about the datasets and their path structure. This catalog is then used to create an esm_datastore (ValenspyEsmDatastore) which can be used to search and load the datasets. The InputManager class provides a preprocessing function based on the input convertors to convert the datasets to ValEnsPy
- _update_catalog(dataset_name, dataset_info_dict)[source]#
Update the catalog (df with dataset information per file) with a new dataset.
The catalog_builder is used to parse the dataset information and update the catalog.
- Parameters:
dataset_info_dict (dict) – A dictionary containing dataset information. The keys are dataset names and the values are dictionaries with the following keys: - root: The root directory of the dataset. - pattern: The regex pattern for matching files in the dataset. - meta_data: A dictionary containing metadata for the dataset.
- add_input_convertor(dataset_name, input_convertor)[source]#
Add an input convertor to the InputManager.
- property intake_to_xarray_kwargs#
Easy access of kwargs to be used passed that can be passed to
intake_esm.esm_datastore.to_dataset_dict
,intake_esm.esm_datastore.to_dask
and/orintake_esm.esm_datastore.to_datatree
Three types of kwargs are created: - preprocess: The preprocessor function which applies the input convertor to the dataset if an input convertor exists (i.e. source_id is in INPUT_CONVERTORS). - xarray_open_kwargs: The kwargs to be passed to xarray.open_dataset. - xarray_combine_by_coords_kwargs: The kwargs to be passed to xarray.combine_by_coords.
- property preprocess#
A preprocessor function to convert the input dataset to ValEnsPy compliant data.
This function applys the input convertor to the dataset if an input convertor exists (i.e. source_id is in this managers input convertors).
- property skipped_files#
The files that where skipped during the catalog creation.
- update_catalog_from_dataset_info(dataset_name, dataset_root_dir, dataset_pattern, metadata={})[source]#
Update the catalog with a new dataset.
For the dataset, parse the dataset information, validate it, add it to the catalog and update the esm_datastore.
- Parameters:
dataset_name (str) – The name of the dataset.
dataset_root_dir (str) – The root directory of the dataset.
dataset_pattern (str) – The regex pattern for matching files in the dataset. This is the reletave path starting from the root and in the following format: <indentifier_name>/<indentifier_name>/<indentifier_name>_fixed_part_<variable_id>/<another_identifier>_<year>.nc
metadata (dict, optional) – Additional metadata to include in the catalog. Default is an empty dictionary.
- update_catalog_from_yaml(yaml_path)[source]#
Update the catalog from a YAML file.
For each dataset, parse the dataset information, validate it, add it to the catalog and update the esm_datastore.
- Parameters:
yaml_path (Path) –
The path to the YAML file containing datasets information a dictionary of dataset names and their dataset information. The datasetinfo should contain the following keys: - root: The root directory of the dataset. - pattern: The regex pattern for matching files in the dataset. This is the reletave path starting from the root and in the following format:
<indentifier_name>/<indentifier_name>/<indentifier_name>_fixed_part_<variable_id>/<another_identifier>_<year>.nc
meta_data: A dictionary containing metadata for the dataset.