input.manager#
Defines the InputManager class for loading and managing input data for ValEnsPy.
Classes
|
- class InputManager(machine)[source]#
- _get_file_paths(dataset_name, variables=['tas'], period=None, freq=None, region=None, path_identifiers=[])[source]#
Get the file paths for the specified dataset, variables, period and frequency.
- load_data(dataset_name, variables=['tas'], period=None, freq=None, region=None, cf_convert=True, path_identifiers=[], metadata_info={})[source]#
Load the data for the specified dataset, variables, period and frequency and transform it into ValEnsPy CF-Compliant format.
For files to be found and loaded they should be in a subdirectory of the dataset path and contain the raw_long_name or raw_name or CORDEX variable name, the year (optional), frequency and path_identifiers (optional) in the file name.
A regex search is used to match any netcdf (.nc) file paths that start with the dataset_path from the dataset_PATHS.yml and contains: 1) The raw_long_name of the CORDEX variables given the dataset_name_lookup.yml 2) Any YYYY string within the period 3) The frequency of the data (daily, monthly, yearly) 4) Any additional path_identifiers
The order of these components is irrelevant. The dataset is then loaded using xarray.open_mfdataset and if cf_convert is True, the data is converted to CF-Compliant format using the appropriate input converter. If no period is specified, all files matching the other components are loaded.
- Parameters:
dataset_name (str) – The name of the dataset to load. This should be in the dataset_PATHS.yml file for the specified machine.
variables (list, optional) – The variables to load. The default is [“tas”]. These should be CORDEX variables defined in CORDEX_variables.yml.
period (list or an int, optional) – The period to load. If a list, the start and end years of the period. For a single year both an int and a list with one element are valid. The default is None.
freq (str, optional) – The frequency of the data. The default is None.
region (str, optional) – The region to load. The default is None.
cf_convert (bool, optional) – Whether to convert the data to CF-Compliant format. The default is True.
path_identifiers (list, optional) – Other identifiers to match in the file paths. These are on top the variable long name, year and frequency. The default is [].
other_metadata_info (dict, optional) – Other metadata information to pass to the input converter. The default is {}.
- Returns:
ds – The loaded dataset in CF-Compliant format.
- Return type:
- Raises:
FileNotFoundError – If no files are found for the specified dataset, variables, period, frequency and path_identifiers.
ValueError – If the dataset name is not valid for the machine. i.e. not in the dataset_PATHS.yml file.
Examples
>>> manager = InputManager(machine='hortense') >>> # Get all ERA5 tas (temperature at 2m) at a daily frequency for the years 2000 and 2001. The paths must include "max". >>> ds = manager.load_data("ERA5", variables=["tas"], period=[2000,2001], path_identifiers=["max"])
- load_m_data(datasets_dict, variables=['tas'], cf_convert=True, metadata_info={})[source]#
Load multiple datasets and variables and return a DataTree object.
Each dataset is passed to the load_data method and the resulting datasets are combined into a DataTree object.
- Parameters:
datasets_dict (dict) – A dictionary of datasets to load. The keys are the dataset names and the values are dictionaries containing the period, frequency, region and path_identifiers as keys.
variables (list) – The variables to load. The default is [“tas”]. These should be CORDEX variables defined in CORDEX_variables.yml.
cf_convert (bool, optional) – Whether to convert the data to CF-Compliant format. The default is True.
metadata_info (dict, optional) – Other metadata information to pass to the input converter. The default is {}.
- Returns:
A DataTree object containing the loaded datasets.
- Return type:
DataTree
Examples
>>> manager = InputManager(machine='hortense') >>> # Get all ERA5 tas (temperature at 2m) at a daily frequency for the years 2000 and 2001. The paths must include "max". >>> data_request_dict={ "EOBS": {"path_identifiers":["mean"]}, "ERA5": {"period":[2000,2001], "freq":"daily", "region":"europe", "path_identifiers":["min"]} } >>> dt = manager.load_m_data(data_request_dict, variables=["tas","pr"])