diagnostic.functions#
Functions
|
Calculate the annual cycle of the data. |
|
Calculate the bias of the data compared to a reference. |
|
Calculate statistical performance metrics for model data against observed data for a single variable. |
|
Calculate statistical performance metrics for model data against observed data for a dataset. |
|
Calculate statistical performance metrics for model data against observed data. |
|
Calculate the diurnal cycle of the data. |
|
Calculate the diurnal cycle bias of the data compared to the reference. |
|
Get user defined, hard coded binwidths for Perkins Skill score calculation |
|
Calculate the Mean Absolute Error (MAE) between model forecasts and reference data. |
|
Calculate the bias of the means of modeled and reference data. |
|
Calculate the optimal bin width for both forecast (da_mod) and observed (da_ref) data. |
|
Calculate the Perkins Skill Score (PSS). |
|
|
|
Calculate the Root Mean Square Error (RMSE) between model data and reference data. |
|
Calculate the spatial bias of the data compared to the reference. |
Calculate the spatial mean of the data. |
|
|
Calculate Spearman's rank correlation coefficient between model data and reference data. |
|
Calculate the temporal bias of the data compared to the reference. |
Calculate the time series of the data. |
|
|
Calculate the trend of the time series data. |
|
Calculate the urban heat island effect as the difference in temperature between an urban and rural area. |
|
Calculate the diurnal cycle of the urban heat island effect as the difference in temperature between an urban and rural area. |
- _add_ranks_metrics(df: DataFrame)[source]#
Ranks the performance of different models across various metrics based on predefined ranking criteria.
This function applies custom ranking rules to evaluate the performance of models across different metrics. The ranking is based on the following criteria:
‘Mean Bias’ is ranked by its absolute value, with smaller values (closer to zero) ranked higher.
‘Spearman Correlation’ and ‘Perkins Skill Score’ are ranked in descending order, meaning higher values (closer to 1) are better.
All other metrics are ranked in ascending order, where lower values are better.
The input DataFrame df is expected to have the following structure: - The first column contains the metric names. - Each subsequent column contains the performance values of different models for each metric.
- Parameters:
df (pandas.DataFrame) – A DataFrame where each row corresponds to a metric, the first column is the metric name, and the subsequent columns contain performance values for different models.
- Returns:
A DataFrame where each value is replaced by its rank based on the ranking criteria for the corresponding metric. The rows are indexed by the metric names.
- Return type:
pandas.DataFrame
- _average_over_dims(ds: Dataset, dims)[source]#
Calculate the average over the specified dimensions if they are present in the data. Otherwise, return the data as is.
- Parameters:
ds (xr.Dataset) – The data to calculate the spatial average of.
dims (list or str) – The dimension(s) to average over.
- Returns:
The data with the specified dimensions averaged over.
- Return type:
xr.Dataset
- annual_cycle(ds: Dataset)[source]#
Calculate the annual cycle of the data. If lat and lon are present, the data is averaged over the spatial dimensions lat and lon.
- Parameters:
ds (xr.Dataset) – The data to calculate the annual cycle of.
- Returns:
The annual cycle of the data.
- Return type:
xr.Dataset
- bias(da: Dataset, ref: Dataset, calc_relative=False)[source]#
Calculate the bias of the data compared to a reference.
- Parameters:
da (xr.DataArray or xr.Dataset) – The data to calculate the bias of.
ref (xr.DataArray or xr.Dataset) – The reference to compare the data to.
calc_relative (bool, optional) – If True, calculate the relative bias, if False calculate the absolute bias, by default False
- Returns:
The bias of the data compared to there reference.
- Return type:
xr.Datasets
- calc_metrics_da(da_mod: DataArray, da_obs: DataArray, metrics=None, pss_binwidth=None)[source]#
Calculate statistical performance metrics for model data against observed data for a single variable.
- calc_metrics_ds(ds_mod: Dataset, ds_obs: Dataset, metrics=None, pss_binwidth=None)[source]#
Calculate statistical performance metrics for model data against observed data for a dataset.
- calc_metrics_dt(dt_mod: DataTree, da_obs: Dataset, metrics=None, pss_binwidth=None)[source]#
Calculate statistical performance metrics for model data against observed data.
This function computes various metrics between the model data stored in the DataTree object dt_mod and the observed data da_obs. Default metrics include Mean Bias, Mean Absolute Error (MAE) at different percentiles, Root Mean Square Error (RMSE), Spearman Correlation, and Perkins Skill Score (PSS).
Parameters:#
- dt_modDataTree
A DataTree containing the model data for different members. The function loops through each member to calculate the metrics.
- da_obsxr.DataSet
The observed data to compare against the model data.
- metricsdict, optional
A dictionary containing the names of the metrics to calculate and the corresponding functions. Default is the below specified metrics.
- pss_binwidthfloat, optional
The bin width to use for the Perkins Skill Score (PSS) calculation. If not provided, the optimal bin width is calculated.a
Returns:#
- df_metricpd.DataFrame
A DataFrame containing the calculated metrics and corresponding rank per metric for each member and variable in the data tree.
Metrics:#
Mean Bias
Mean Absolute Error
MAE at 90th Percentile
MAE at 99th Percentile
MAE at 10th Percentile
MAE at 1st Percentile
Root Mean Square Error
Spearman Correlation
Perkins Skill Score
- diurnal_cycle(ds: Dataset)[source]#
Calculate the diurnal cycle of the data. If lat and lon are present, the data is averaged over the spatial dimensions lat and lon.
- Parameters:
ds (xr.Dataset) – The data to calculate the diurnal cycle of.
- Returns:
The diurnal cycle of the data.
- Return type:
xr.Dataset
- diurnal_cycle_bias(ds: Dataset, ref: Dataset, calc_relative=False)[source]#
Calculate the diurnal cycle bias of the data compared to the reference. If lat and lon are present, ds and ref is averaged over the spatial dimensions lat and lon.
- Parameters:
ds (xr.Dataset) – The data to calculate the diurnal cycle bias of.
ref (xr.Dataset) – The reference data to compare the data to.
calc_relative (bool, optional) – If True, return the calc_relative bias, by default False
- Returns:
The diurnal cycle bias of the data compared to the reference.
- Return type:
xr.Dataset
- get_userdefined_binwidth(variable)[source]#
Get user defined, hard coded binwidths for Perkins Skill score calculation
- mean_absolute_error(da_mod: DataArray, da_ref: DataArray, percentile: float = None) float [source]#
Calculate the Mean Absolute Error (MAE) between model forecasts and reference data. Optionally, calculate the MAE based on a specified percentile.
- Parameters:
da_mod (xr.DataArray) – The model forecast data to compare.
da_ref (xr.DataArray) – The reference data to compare against.
percentile (float, optional) – The percentile (0 to 1) to calculate the MAE for, using the quantile values of the data arrays. If None, calculates the MAE for the entire data without considering percentiles.
- Returns:
The Mean Absolute Error (MAE) between the model and reference data, or at the specified percentile.
- Return type:
float
- mean_bias(da_mod: Dataset, da_ref: Dataset)[source]#
Calculate the bias of the means of modeled and reference data.
- Parameters:
da (xr.DataArray or xr.Dataset) – The data to calculate the bias of.
ref (xr.DataArray or xr.Dataset) – The reference to compare the data to.
calc_relative (bool, optional) – If True, calculate the relative bias, if False calculate the absolute bias, by default False
- Returns:
The bias of the data compared to there reference.
- Return type:
xr.Datasets
- optimal_bin_width(da_mod: DataArray, da_ref: DataArray) float [source]#
Calculate the optimal bin width for both forecast (da_mod) and observed (da_ref) data.
Parameters: da_mod (xr.DataArray): Forecasted temperatures (continuous). da_ref (xr.DataArray): Observed temperatures (continuous).
Returns: float: Optimal bin width for both datasets.
- perkins_skill_score(da: DataArray, ref: DataArray, binwidth: float = None)[source]#
Calculate the Perkins Skill Score (PSS).
- Parameters:
da (xr.DataArray) – The model data to compare.
ref (xr.DataArray) – The reference data to compare against.
binwidth (float) – The width of each bin for the histogram. If not provided, it is calculated.
- Returns:
The Perkins Skill Score (PSS).
- Return type:
float
- root_mean_square_error(da_mod: DataArray, da_ref: DataArray) float [source]#
Calculate the Root Mean Square Error (RMSE) between model data and reference data.
- Parameters:
da_mod (xr.DataArray) – The model data to compare (should match the shape of da_ref).
da_ref (xr.DataArray) – The reference data to compare against (should match the shape of da_mod).
- Returns:
The Root Mean Square Error (RMSE) between the model and reference data.
- Return type:
float
- spatial_bias(ds: Dataset, ref: Dataset, calc_relative=False)[source]#
Calculate the spatial bias of the data compared to the reference. The time dimensions are averaged over if present.
- Parameters:
ds (xr.Dataset) – The data to calculate the spatial bias of.
ref (xr.Dataset or xr.DataArray) – The reference data to compare the data to.
calc_relative (bool, optional) – If True, return the relative bias, if False return the absolute bias, by default False
- Returns:
The spatial bias of the data compared to the reference.
- Return type:
xr.Dataset or xr.DataArray
- spatial_time_mean(ds: Dataset)[source]#
Calculate the spatial mean of the data. If the time dimension is present, the data is averaged over the time dimension.
- Parameters:
ds (xr.Dataset) – The data to calculate the spatial mean of.
- Returns:
The spatial mean of the data.
- Return type:
xr.Dataset
- spearman_correlation(da_mod: DataArray, da_ref: DataArray) float [source]#
Calculate Spearman’s rank correlation coefficient between model data and reference data.
- Parameters:
da_mod (xr.DataArray) – The model data to compare (2D array where rows are observations and columns are variables).
da_ref (xr.DataArray) – The reference data to compare (2D array where rows are observations and columns are variables).
- Returns:
Spearman’s rank correlation coefficient between the flattened model and reference data.
- Return type:
float
- temporal_bias(ds: Dataset, ref: Dataset, calc_relative=False)[source]#
Calculate the temporal bias of the data compared to the reference. If lat and lon are present, ds and ref is averaged over the spatial dimensions lat and lon.
- Parameters:
ds (xr.Dataset) – The data to calculate the temporal bias of.
ref (xr.Dataset) – The reference data to compare the data to.
calc_relative (bool, optional) – If True, return the relative bias, if False return the absolute bias, by default False
- Returns:
The temporal bias of the data compared to the reference.
- Return type:
xr.Dataset
- time_series_spatial_mean(ds: Dataset)[source]#
Calculate the time series of the data. If lat and lon are present, the data is averaged over the spatial dimensions lat and lon.
- Parameters:
ds (xr.Dataset) – The data to calculate the time series of the spatial mean of.
- Returns:
The time series of the spatial mean of the data.
- Return type:
xr.Dataset
- time_series_trend(ds: Dataset, window_size, min_periods: int = None, center: bool = True, **window_kwargs)[source]#
Calculate the trend of the time series data. If lat and lon are present, the data is averaged over the spatial dimensions lat and lon.
- Parameters:
ds (xr.Dataset) – The data to calculate the trend of.
window_size (int) – The size - in number of time steps - of the window to use for the rolling average.
min_periods (int, optional) – The minimum number of periods required for a value to be considered valid, by default None
center (bool, optional) – If True, the value is placed in the center of the window, by default True
- Returns:
The trend of the data.
- Return type:
xr.Dataset
- urban_heat_island(ds: Dataset, urban_coord: tuple, rural_coord: tuple, projection=None)[source]#
Calculate the urban heat island effect as the difference in temperature between an urban and rural area. The grid-boxes closest to the urban and rural coordinates are selected and a difference is calculated between the two.
- Parameters:
ds (xr.Dataset) – The data to calculate the urban heat island effect of.
urban_coord (tuple) – The coordinates of the urban area in the format (lat, lon).
rural_coord (tuple) – The coordinates of the rural area in the format (lat, lon).
projection (str, optional) – The projection used to convert the urban and rural coordinates to the dataset’s projection.
- Returns:
The urban heat island effect as the difference in temperature between the urban and rural area.
- Return type:
xr.Dataset
- urban_heat_island_diurnal_cycle(ds: Dataset, urban_coord: tuple, rural_coord: tuple, projection=None)[source]#
Calculate the diurnal cycle of the urban heat island effect as the difference in temperature between an urban and rural area. The grid-boxes closest to the urban and rural coordinates are selected and a difference is calculated between the two.
- Parameters:
ds (xr.Dataset) – The data to calculate the urban heat island effect of.
urban_coord (tuple) – The coordinates of the urban area in the format (lat, lon).
rural_coord (tuple) – The coordinates of the rural area in the format (lat, lon).
projection (str, optional) – The projection used to convert the urban and rural coordinates to the dataset’s projection.
- Returns:
The diurnal cycle of the urban heat island effect as the difference in temperature between the urban and rural area.
- Return type:
xr.Dataset