Skip to content

Parameter Validation (Detailed)

Catalog-specific parameter validators for climate data queries.

Overview

The climakitae.new_core.param_validation module provides catalog-specific validators that ensure climate data query parameters are valid before execution. Each validator: - Validates required parameters for its catalog - Checks parameter compatibility - Suggests corrections for invalid parameters - Handles edge cases (e.g., models that don't reach warming levels)

Base Validator

All validators inherit from the abstract base class:

Parameter validation module for climakitae.

This module provides a comprehensive framework for validating query parameters used throughout the climakitae package. It includes:

  • Abstract base class for parameter validation (ParameterValidator)
  • Registry system for catalog and processor validators
  • Validation logic for dataset queries and processing parameters
  • Helper functions for finding closest matching options when validation fails

The validation system operates on two levels: 1. Catalog validation: Ensures query parameters match available datasets 2. Processor validation: Validates processing parameters for data transformations

Classes:

Name Description
ParameterValidator

Abstract base class defining the parameter validation interface. Subclasses must implement is_valid_query() method.

Functions:

Name Description
register_catalog_validator

Decorator for registering catalog validator classes.

register_processor_validator

Decorator for registering processor validator classes.

Module Variables

_CATALOG_VALIDATOR_REGISTRY : dict Registry mapping validator names to catalog validator classes. _PROCESSOR_VALIDATOR_REGISTRY : dict Registry mapping validator names to processor validator classes.

Examples:

>>> @register_catalog_validator("my_catalog")
... class MyCatalogValidator(ParameterValidator):
...     def is_valid_query(self, query):
...         # Implementation here
...         pass

ParameterValidator()

Bases: ABC

Abstract base class for parameter validation in climakitae.

This class provides a framework for validating user queries containing dataset selection parameters and processing parameters. It handles:

  • Catalog parameter validation (dataset selection)
  • Processor parameter validation (data transformations)
  • Error handling and user-friendly suggestions
  • Parameter conversion and normalization

The validation process includes: 1. Converting user input to catalog keys 2. Searching for matching datasets in the catalog 3. Providing suggestions for invalid parameters 4. Validating processing parameters

Attributes:

Name Type Description
catalog_path str

Path to the catalog CSV file.

catalog object

Data catalog instance for dataset searching.

all_catalog_keys dict

Dictionary of catalog keys populated from user query.

catalog_df DataFrame

DataFrame containing catalog information.

Methods:

Name Description
is_valid_query

Abstract method to validate query parameters. Must be implemented by subclasses.

populate_catalog_keys

Populate catalog keys from user query.

load_catalog_df

Load the catalog DataFrame.

Notes

Subclasses must implement the is_valid_query method to define specific validation logic for their use case.

Examples:

>>> class MyValidator(ParameterValidator):
...     def is_valid_query(self, query):
...         # Custom validation logic
...         return self._is_valid_query(query)

Initialize the ParameterValidator.

Sets up the validator with default catalog path and initializes catalog-related attributes. Loads the catalog DataFrame upon instantiation.

Attributes initialized: - catalog_path: Path to the catalog CSV file - catalog: Set to UNSET initially, populated by subclasses - all_catalog_keys: Set to UNSET initially, populated during validation - catalog_df: Loaded from DataCatalog

Source code in climakitae/new_core/param_validation/abc_param_validation.py
def __init__(self):
    """Initialize the ParameterValidator.

    Sets up the validator with default catalog path and initializes
    catalog-related attributes. Loads the catalog DataFrame upon instantiation.

    Attributes initialized:
    - catalog_path: Path to the catalog CSV file
    - catalog: Set to UNSET initially, populated by subclasses
    - all_catalog_keys: Set to UNSET initially, populated during validation
    - catalog_df: Loaded from DataCatalog

    """
    self.catalog_path = "climakitae/data/catalogs.csv"
    self.catalog = UNSET
    self.all_catalog_keys = UNSET
    self.load_catalog_df()

get_default_processors(query)

Get default processors for this catalog.

This method returns a dictionary of default processor configurations that should be applied if not explicitly set by the user.

Parameters:

Name Type Description Default
query Dict[str, Any]

The current query containing user parameters

required

Returns:

Type Description
Dict[str, Any]

Dictionary mapping processor names to their default configurations

Notes

Subclasses should override this method to provide catalog-specific defaults. The default implementation returns only the universal defaults that apply to all catalogs.

Source code in climakitae/new_core/param_validation/abc_param_validation.py
def get_default_processors(self, query: Dict[str, Any]) -> Dict[str, Any]:
    """Get default processors for this catalog.

    This method returns a dictionary of default processor configurations
    that should be applied if not explicitly set by the user.

    Parameters
    ----------
    query : Dict[str, Any]
        The current query containing user parameters

    Returns
    -------
    Dict[str, Any]
        Dictionary mapping processor names to their default configurations

    Notes
    -----
    Subclasses should override this method to provide catalog-specific defaults.
    The default implementation returns only the universal defaults that apply
    to all catalogs.
    """
    # Universal default for all catalogs
    return {
        "update_attributes": UNSET,
    }

is_valid_query(query) abstractmethod

Validate the query parameters (abstract method).

This method must be implemented by subclasses to define specific validation logic for their use case. It should validate both catalog parameters (for dataset selection) and processing parameters.

Parameters:

Name Type Description Default
query Dict[str, Any]

Query parameters to validate. Expected to contain: - Dataset selection parameters (e.g., variable, experiment_id, etc.) - Processing parameters under the 'processes' key - Any other relevant validation parameters

required

Returns:

Type Description
Dict[str, Any] | None

Validated and processed query parameters if valid, None if invalid. When returning a dictionary, it should contain the cleaned and validated parameters ready for dataset retrieval.

Notes

Implementations typically call _is_valid_query() to leverage the common validation logic provided by the base class.

Examples:

>>> def is_valid_query(self, query):
...     # Custom pre-processing
...     processed_query = self.preprocess_query(query)
...     # Use base class validation
...     return self._is_valid_query(processed_query)
Source code in climakitae/new_core/param_validation/abc_param_validation.py
@abstractmethod
def is_valid_query(self, query: Dict[str, Any]) -> Dict[str, Any] | None:
    """Validate the query parameters (abstract method).

    This method must be implemented by subclasses to define specific
    validation logic for their use case. It should validate both
    catalog parameters (for dataset selection) and processing parameters.

    Parameters
    ----------
    query : Dict[str, Any]
        Query parameters to validate. Expected to contain:
        - Dataset selection parameters (e.g., variable, experiment_id, etc.)
        - Processing parameters under the 'processes' key
        - Any other relevant validation parameters

    Returns
    -------
    Dict[str, Any] | None
        Validated and processed query parameters if valid, None if invalid.
        When returning a dictionary, it should contain the cleaned and
        validated parameters ready for dataset retrieval.

    Notes
    -----
    Implementations typically call `_is_valid_query()` to leverage the
    common validation logic provided by the base class.

    Examples
    --------
    >>> def is_valid_query(self, query):
    ...     # Custom pre-processing
    ...     processed_query = self.preprocess_query(query)
    ...     # Use base class validation
    ...     return self._is_valid_query(processed_query)

    """

populate_catalog_keys(query)

Populate catalog keys from user query, filtering out unset values.

This method extracts relevant catalog parameters from the user query and stores them in self.all_catalog_keys. Only parameters that are actually set (not UNSET) are retained.

Parameters:

Name Type Description Default
query Dict[str, Any]

User query containing potential catalog parameters.

required

Returns:

Type Description
None

Updates self.all_catalog_keys in-place.

Notes

This method assumes self.all_catalog_keys already contains the expected catalog parameter names (typically initialized by subclasses). The method: 1. Maps query values to catalog keys 2. Removes any UNSET values 3. Stores the result in self.all_catalog_keys

Side Effects

Modifies self.all_catalog_keys attribute.

Source code in climakitae/new_core/param_validation/abc_param_validation.py
def populate_catalog_keys(self, query: Dict[str, Any]) -> None:
    """Populate catalog keys from user query, filtering out unset values.

    This method extracts relevant catalog parameters from the user query
    and stores them in `self.all_catalog_keys`. Only parameters that are
    actually set (not UNSET) are retained.

    Parameters
    ----------
    query : Dict[str, Any]
        User query containing potential catalog parameters.

    Returns
    -------
    None
        Updates `self.all_catalog_keys` in-place.

    Notes
    -----
    This method assumes `self.all_catalog_keys` already contains the expected
    catalog parameter names (typically initialized by subclasses). The method:
    1. Maps query values to catalog keys
    2. Removes any UNSET values
    3. Stores the result in `self.all_catalog_keys`

    Side Effects
    ------------
    Modifies `self.all_catalog_keys` attribute.

    """
    # populate catalog keys with the values from the query
    for key in self.all_catalog_keys.keys():
        self.all_catalog_keys[key] = query.get(key, UNSET)

    # remove any unset values
    self.all_catalog_keys = {
        k: v for k, v in self.all_catalog_keys.items() if v is not UNSET
    }

load_catalog_df()

Load the data catalog DataFrame and assign to instance attribute.

Creates a DataCatalog instance and extracts its catalog DataFrame for use in parameter validation. The DataFrame contains metadata about available datasets.

Returns:

Type Description
None

Sets self.catalog_df attribute.

Notes

This method is called during initialization and provides access to the catalog data needed for parameter validation and suggestion generation.

Side Effects

Sets self.catalog_df attribute with the loaded catalog DataFrame.

Source code in climakitae/new_core/param_validation/abc_param_validation.py
def load_catalog_df(self) -> None:
    """Load the data catalog DataFrame and assign to instance attribute.

    Creates a DataCatalog instance and extracts its catalog DataFrame
    for use in parameter validation. The DataFrame contains metadata
    about available datasets.

    Returns
    -------
    None
        Sets `self.catalog_df` attribute.

    Notes
    -----
    This method is called during initialization and provides access to
    the catalog data needed for parameter validation and suggestion generation.

    Side Effects
    ------------
    Sets `self.catalog_df` attribute with the loaded catalog DataFrame.

    """
    self.catalog_df = DataCatalog().catalog_df

register_catalog_validator(name)

Decorator to register a catalog validator class in the global registry.

This decorator allows validator classes to be registered for use with specific catalog types. Registered validators can be retrieved and instantiated by name from the global registry.

Parameters:

Name Type Description Default
name str

Unique name to register the validator under. This name will be used to look up the validator class in the registry.

required

Returns:

Type Description
function

Decorator function that registers the class and returns it unchanged.

Examples:

>>> @register_catalog_validator("my_catalog")
... class MyCatalogValidator(ParameterValidator):
...     def is_valid_query(self, query):
...         return self._is_valid_query(query)
>>> # Later retrieval:
>>> validator_class = _CATALOG_VALIDATOR_REGISTRY["my_catalog"]
>>> validator = validator_class()
Notes

The registered class is stored in the module-level _CATALOG_VALIDATOR_REGISTRY dictionary.

Source code in climakitae/new_core/param_validation/abc_param_validation.py
def register_catalog_validator(
    name: str,
) -> Callable[["Type[ParameterValidator]"], "Type[ParameterValidator]"]:
    """Decorator to register a catalog validator class in the global registry.

    This decorator allows validator classes to be registered for use with
    specific catalog types. Registered validators can be retrieved and
    instantiated by name from the global registry.

    Parameters
    ----------
    name : str
        Unique name to register the validator under. This name will be used
        to look up the validator class in the registry.

    Returns
    -------
    function
        Decorator function that registers the class and returns it unchanged.

    Examples
    --------
    >>> @register_catalog_validator("my_catalog")
    ... class MyCatalogValidator(ParameterValidator):
    ...     def is_valid_query(self, query):
    ...         return self._is_valid_query(query)

    >>> # Later retrieval:
    >>> validator_class = _CATALOG_VALIDATOR_REGISTRY["my_catalog"]
    >>> validator = validator_class()

    Notes
    -----
    The registered class is stored in the module-level
    `_CATALOG_VALIDATOR_REGISTRY` dictionary.

    """

    def decorator(cls: Type[ParameterValidator]) -> Type[ParameterValidator]:
        _CATALOG_VALIDATOR_REGISTRY[name] = cls
        return cls

    return decorator

register_processor_validator(name)

Decorator to register a processor validator function in the global registry.

This decorator allows processor validation functions to be registered for use with specific processing parameters. Registered validators can be retrieved and called by name from the global registry.

Parameters:

Name Type Description Default
name str

Unique name to register the processor validator under. This should match the processor parameter name that the validator handles.

required

Returns:

Type Description
function

Decorator function that registers the validator function and returns it unchanged.

Examples:

>>> @register_processor_validator("spatial_subset")
... def validate_spatial_subset(value, query=None):
...     # Validation logic for spatial_subset processor
...     return isinstance(value, dict) and 'bounds' in value
>>> # Later retrieval and use:
>>> validator_func = _PROCESSOR_VALIDATOR_REGISTRY["spatial_subset"]
>>> is_valid = validator_func(subset_params, query=user_query)
Notes
  • The registered function is stored in the module-level _PROCESSOR_VALIDATOR_REGISTRY dictionary
  • Processor validators should accept value and optional query parameters
  • Validators may modify the query in-place for parameter normalization
Source code in climakitae/new_core/param_validation/abc_param_validation.py
def register_processor_validator(
    name: str,
) -> Callable[["Type[ParameterValidator]"], "Type[ParameterValidator]"]:
    """Decorator to register a processor validator function in the global registry.

    This decorator allows processor validation functions to be registered for
    use with specific processing parameters. Registered validators can be
    retrieved and called by name from the global registry.

    Parameters
    ----------
    name : str
        Unique name to register the processor validator under. This should
        match the processor parameter name that the validator handles.

    Returns
    -------
    function
        Decorator function that registers the validator function and returns it unchanged.

    Examples
    --------
    >>> @register_processor_validator("spatial_subset")
    ... def validate_spatial_subset(value, query=None):
    ...     # Validation logic for spatial_subset processor
    ...     return isinstance(value, dict) and 'bounds' in value

    >>> # Later retrieval and use:
    >>> validator_func = _PROCESSOR_VALIDATOR_REGISTRY["spatial_subset"]
    >>> is_valid = validator_func(subset_params, query=user_query)

    Notes
    -----
    - The registered function is stored in the module-level
      `_PROCESSOR_VALIDATOR_REGISTRY` dictionary
    - Processor validators should accept `value` and optional `query` parameters
    - Validators may modify the query in-place for parameter normalization

    """

    def decorator(cls: Type[ParameterValidator]) -> Type[ParameterValidator]:
        _PROCESSOR_VALIDATOR_REGISTRY[name] = cls
        return cls

    return decorator

Catalog Validators

Validator for data catalog parameters.

DataValidator(catalog)

Bases: ParameterValidator

Validator for data catalog parameters.

Parameters:

Name Type Description Default
catalog DataCatalog

the DataCatalog object to validate against

required

Initialize with catalog of renewable energy datasets.

Parameters:

Name Type Description Default
catalog DataCatalog

Catalog of datasets

required
Source code in climakitae/new_core/param_validation/cadcat_param_validator.py
def __init__(self, catalog: DataCatalog):
    """Initialize with  catalog of renewable energy datasets.

    Parameters
    ----------
    catalog : DataCatalog
        Catalog of datasets

    """
    super().__init__()
    self.all_catalog_keys = {
        "activity_id": UNSET,
        "institution_id": UNSET,
        "source_id": UNSET,
        "experiment_id": UNSET,
        "table_id": UNSET,
        "grid_label": UNSET,
        "variable_id": UNSET,
    }
    self.catalog = catalog.data
    self.invalid_processors = []
    logger.debug(
        "DataValidator initialized for catalog with keys: %s",
        list(self.catalog.keys()) if hasattr(self.catalog, "keys") else "unknown",
    )

get_default_processors(query)

Get default processors for CADCAT catalog.

Climate model data gets filter_unadjusted_models and smart concatenation based on experiment_id.

Parameters:

Name Type Description Default
query Dict[str, Any]

The current query containing user parameters

required

Returns:

Type Description
Dict[str, Any]

Dictionary mapping processor names to their default configurations

Source code in climakitae/new_core/param_validation/cadcat_param_validator.py
def get_default_processors(self, query: Dict[str, Any]) -> Dict[str, Any]:
    """Get default processors for CADCAT catalog.

    Climate model data gets filter_unadjusted_models and smart concatenation
    based on experiment_id.

    Parameters
    ----------
    query : Dict[str, Any]
        The current query containing user parameters

    Returns
    -------
    Dict[str, Any]
        Dictionary mapping processor names to their default configurations
    """
    defaults = super().get_default_processors(query)

    # Add default filtering for climate model data
    defaults["filter_unadjusted_models"] = "yes"

    # Drop leap days by default
    defaults["drop_leap_days"] = "yes"

    # Set default concatenation
    concat_dim = "time"

    # if experiment_id is a string, check if it contains "historical"
    experiment_id = query.get("experiment_id", UNSET)
    match experiment_id:
        case str():
            if (
                "historical" in experiment_id.lower()
                or "reanalysis" in experiment_id.lower()
            ):
                # if it does, we can use "sim" as the default concat dimension
                concat_dim = "sim"
        case list() | tuple():
            # if experiment_id is a list or tuple, check each element
            # if there are no elements with "ssp" in them then we use the sim approach
            if not any("ssp" in str(item).lower() for item in experiment_id):
                concat_dim = "sim"

    defaults["concat"] = concat_dim
    return defaults

is_valid_query(query)

Catalog specific validation for the query.

Parameters:

Name Type Description Default
query Dict[str, Any]

The query to validate.

required

Returns:

Type Description
Dict[str, Any] | None

The validated query if valid, None otherwise.

Notes

A list of checks that are performed on the query:

  1. Check if the query contains the localize processor. Localize is not supported for LOCA2 datasets.
Source code in climakitae/new_core/param_validation/cadcat_param_validator.py
def is_valid_query(self, query: Dict[str, Any]) -> Dict[str, Any] | None:
    """Catalog specific validation for the query.

    Parameters
    ----------
    query : Dict[str, Any]
        The query to validate.

    Returns
    -------
    Dict[str, Any] | None
        The validated query if valid, None otherwise.

    Notes
    -----
    A list of checks that are performed on the query:

    1. Check if the query contains the localize processor.
        Localize is not supported for LOCA2 datasets.

    """
    logger.debug("Validating query: %s", query)
    initial_checks = [
        self._check_query_for_wrf_and_localize(query),
        self._check_query_for_required_keys(query),
        self._check_wrf_requires_institution_id(query),
    ]
    if not all(initial_checks):
        logger.warning("Initial validation checks failed: %s", initial_checks)
        return None
    result = super()._is_valid_query(query)
    logger.info("Query validation result: %s", bool(result))
    return result

Validator for parameters provided to Warming Level Processor.

validate_time_slice_param(value, **kwargs)

Validate the parameters provided to the time slice Processor.

Parameters:

Name Type Description Default
value tuple(date - like, date - like)

The value to subset the data by. This should be a tuple of two date-like values.

required

Returns:

Type Description
bool

True if all parameters are valid, False otherwise

Source code in climakitae/new_core/param_validation/time_slice_param_validator.py
@register_processor_validator("time_slice")
def validate_time_slice_param(value: tuple[Any, Any], **kwargs) -> bool:
    """
    Validate the parameters provided to the time slice Processor.

    Parameters
    ----------
    value : tuple(date-like, date-like)
        The value to subset the data by. This should be a tuple of two
        date-like values.

    Returns
    -------
    bool
        True if all parameters are valid, False otherwise
    """
    logger.debug(
        "validate_time_slice_param called with value=%s kwargs=%s", value, kwargs
    )
    if isinstance(value, dict):
        time_slice = value.get("dates", None)
        season_filter = value.get("seasons", UNSET)
    else:
        time_slice = value
        season_filter = UNSET

    if not isinstance(time_slice, tuple) or len(time_slice) != 2:
        msg = "Time Slice Processor expects a tuple of two date-like values. Please check the configuration."
        logger.warning(msg)
        return False

    if season_filter is not UNSET:
        msg = (
            "\nIf provided, 'seasons' parameter must be a list of season names or a single season name. "
            "(e.g., ['DJF', 'MAM', 'JJA', 'SON']). Please check the configuration."
        )
        if isinstance(season_filter, (list, tuple)):
            if not all(
                isinstance(season, str) and season in ["DJF", "MAM", "JJA", "SON"]
                for season in season_filter
            ):
                logger.warning(msg)
                return False
        if isinstance(season_filter, str):
            if season_filter not in ["DJF", "MAM", "JJA", "SON"]:
                logger.warning(msg)
                return False

    try:
        value = _coerce_to_dates(time_slice)
    except ValueError as e:
        msg = f"Invalid date-like values provided: {e}. Expected a tuple of two date-like values."
        logger.warning(msg)
        return False
    return True  # All parameters are valid

Validator for parameters provided to Warming Level Processor.

validate_warming_level_param(value, **kwargs)

Validate the parameters provided to the Warming Level Processor.

This function checks the value provided to the Warming Level Processor and ensures that it meets the expected criteria. Will raise a user warning and return false if the value is not valid.

Parameters:

Name Type Description Default
value Union[str, list, dict[str, Any]]

The configuration dictionary to validate. Expected keys: - warming_levels : list[float] List of global warming levels in degrees C (e.g., [1.5, 2.0]) - warming_level_months : list[int], optional List of months to include (1-12). Default: all months - warming_level_window : int, optional Number of years before and after the central year. Default: 15 - add_dummy_time: bool, optional Default: False If True, replace the [hours/days/months]_from_center or time_delta dimension in a DataArray returned from WarmingLevels with a dummy time index for calculations with tools that require a time dimension.

required

Returns:

Type Description
bool

True if all parameters are valid, False otherwise

Source code in climakitae/new_core/param_validation/warming_param_validator.py
@register_processor_validator("warming_level")
def validate_warming_level_param(
    value: dict[str, Any],
    **kwargs: Any,
) -> bool:
    """
    Validate the parameters provided to the Warming Level Processor.

    This function checks the value provided to the Warming Level Processor and ensures that it
    meets the expected criteria. Will raise a user warning and return false if the value
    is not valid.

    Parameters
    ----------
    value : Union[str, list, dict[str, Any]]
        The configuration dictionary to validate. Expected keys:
        - warming_levels : list[float]
            List of global warming levels in degrees C (e.g., [1.5, 2.0])
        - warming_level_months : list[int], optional
            List of months to include (1-12). Default: all months
        - warming_level_window : int, optional
            Number of years before and after the central year. Default: 15
        - add_dummy_time: bool, optional
            Default: False
            If True, replace the [hours/days/months]_from_center or time_delta dimension
                in a DataArray returned from WarmingLevels with a dummy time index for
                calculations with tools that require a time dimension.

    Returns
    -------
    bool
        True if all parameters are valid, False otherwise
    """
    logger.debug(
        "validate_warming_level_param called with value=%s kwargs=%s", value, kwargs
    )

    if not _check_input_types(value):
        return False

    # now we have to check some more serious stuff
    query = kwargs.get("query", UNSET)
    if query is UNSET:
        msg = "Warming Level Processor requires a 'query' parameter. Please check the configuration."
        logger.warning(msg)
        return False

    # check that catalog is "cadcat"
    if not _check_catalog(query):
        return False

    # validate query
    if not _check_query(query):
        return False

    if not _check_wl_values(value, query):
        return False

    return True

Validator for parameters provided to Clip Processor.

validate_clip_param(value, **kwargs)

Validate parameter passed to Clip Processor.

This function validates and normalizes parameters for the Clip processor, addressing common input validation issues:

  1. Input Validation: Rejects None values and mixed types
  2. Empty/Whitespace Handling: Filters out empty/whitespace strings
  3. Case Sensitivity: Provides consistent handling and warnings for case mismatches
  4. Duplicate Handling: Deduplicates boundary lists while preserving order
  5. Coordinate Validation: Validates lat/lon coordinate bounds

Parameters:

Name Type Description Default
value Any

Parameter value to validate. This can be a string, list of strings, or a tuple of coordinate bounds.

required

Returns:

Name Type Description
bool bool

True if the parameter is valid, otherwise raises an exception.

Raises:

Type Description
ValueError

If the input is invalid and cannot be corrected

TypeError

If the input type is not supported

Source code in climakitae/new_core/param_validation/clip_param_validator.py
@register_processor_validator("clip")
def validate_clip_param(
    value: Any,
    **kwargs: Any,
) -> bool:
    """
    Validate parameter passed to Clip Processor.

    This function validates and normalizes parameters for the Clip processor,
    addressing common input validation issues:

    1. **Input Validation**: Rejects None values and mixed types
    2. **Empty/Whitespace Handling**: Filters out empty/whitespace strings
    3. **Case Sensitivity**: Provides consistent handling and warnings for case mismatches
    4. **Duplicate Handling**: Deduplicates boundary lists while preserving order
    5. **Coordinate Validation**: Validates lat/lon coordinate bounds

    Parameters
    ----------
    value : Any
        Parameter value to validate. This can be a string, list of strings,
        or a tuple of coordinate bounds.

    Returns
    -------
    bool:
        True if the parameter is valid, otherwise raises an exception.

    Raises
    ------
    ValueError
        If the input is invalid and cannot be corrected
    TypeError
        If the input type is not supported
    """

    logger.debug("validate_clip_param called with value: %s", value)

    # Handle None values early
    if value is None or value is UNSET:
        msg = "Clip parameter cannot be None. Please provide a valid boundary key, list of keys, file path, or coordinate bounds."
        logger.warning(msg)
        return False

    match value:
        case str():
            return _validate_string_param(value)
        case list():
            return _validate_list_param(value)
        case tuple():
            return _validate_tuple_param(value)
        case dict():
            return _validate_dict_param(value)
        case _:
            logger.warning(
                f"\n\nInvalid parameter type for Clip processor. "
                f"\nExpected str, list, tuple, or dict, but got {type(value).__name__}. "
                f"\nValid examples: 'CA', ['CA', 'OR'], ((32.0, 42.0), (-125.0, -114.0)), "
                f"or {{'boundaries': ['CA', 'OR'], 'separated': True}}"
            )

    return False

Validator for parameters provided to Export Processor.

validate_export_param(value, **_kwargs)

Validate parameters passed to Export Processor.

This function validates and normalizes parameters for the Export processor, addressing common input validation issues:

  1. File Path Validation: Checks if output files already exist
  2. Parameter Type Validation: Ensures correct types for all parameters
  3. Format/Mode Compatibility: Validates S3 requires Zarr format
  4. Template Validation: Validates filename template placeholders
  5. File Conflict Detection: Warns about existing files with similar names

Parameters:

Name Type Description Default
value Any

Export configuration parameters (expected to be a dictionary)

required

Returns:

Type Description
bool

True if parameters are valid, False otherwise

Raises:

Type Description
ValueError

If parameters are invalid and cannot be corrected

TypeError

If parameter types are incorrect

Source code in climakitae/new_core/param_validation/export_param_validator.py
@register_processor_validator("export")
def validate_export_param(
    value: Any,
    **_kwargs: Any,
) -> bool:
    """
    Validate parameters passed to Export Processor.

    This function validates and normalizes parameters for the Export processor,
    addressing common input validation issues:

    1. **File Path Validation**: Checks if output files already exist
    2. **Parameter Type Validation**: Ensures correct types for all parameters
    3. **Format/Mode Compatibility**: Validates S3 requires Zarr format
    4. **Template Validation**: Validates filename template placeholders
    5. **File Conflict Detection**: Warns about existing files with similar names

    Parameters
    ----------
    value : Any
        Export configuration parameters (expected to be a dictionary)

    Returns
    -------
    bool
        True if parameters are valid, False otherwise

    Raises
    ------
    ValueError
        If parameters are invalid and cannot be corrected
    TypeError
        If parameter types are incorrect
    """

    if value is None or value is UNSET:
        logger.warning(
            "Export parameters cannot be None. Using default export configuration."
        )
        return True  # Will use defaults

    if not isinstance(value, dict):
        logger.warning(
            "Export parameters must be a dictionary, got %s. "
            "Using default export configuration.",
            type(value).__name__,
        )
        return False

    # Validate individual parameters
    try:
        _validate_filename_param(value)
        _validate_file_format_param(value)
        _validate_mode_param(value)
        _validate_export_method_param(value)
        _validate_boolean_params(value)
        _validate_filename_template_param(value)
        _validate_format_mode_compatibility(value)

    except (ValueError, TypeError) as e:
        logger.warning("Export parameter validation failed: %s", str(e))
        return False

    return True

validate_export_output_path(filename, file_format='NetCDF', check_permissions=True)

Comprehensive validation of export output path.

Parameters:

Name Type Description Default
filename str

Filename for export

required
file_format str

File format for extension mapping

'NetCDF'
check_permissions bool

Whether to check write permissions

True

Returns:

Type Description
Dict[str, Any]

Validation results with warnings and recommendations

Source code in climakitae/new_core/param_validation/export_param_validator.py
def validate_export_output_path(
    filename: str, file_format: str = "NetCDF", check_permissions: bool = True
) -> Dict[str, Any]:
    """
    Comprehensive validation of export output path.

    Parameters
    ----------
    filename : str
        Filename for export
    file_format : str
        File format for extension mapping
    check_permissions : bool
        Whether to check write permissions

    Returns
    -------
    Dict[str, Any]
        Validation results with warnings and recommendations
    """
    results = {"is_valid": True, "warnings": [], "errors": [], "full_path": None}

    try:
        # Determine full path
        extension_map = {"zarr": ".zarr", "netcdf": ".nc", "csv": ".csv.gz"}
        extension = extension_map.get(file_format.lower(), ".nc")
        full_path = f"{filename}{extension}"
        results["full_path"] = full_path

        # Check if file exists
        if os.path.exists(full_path):
            results["warnings"].append(f"File '{full_path}' will be overwritten")

        # Check write permissions for directory
        if check_permissions:
            directory = os.path.dirname(full_path) or "."
            if not os.access(directory, os.W_OK):
                results["errors"].append(
                    f"No write permission for directory '{directory}'"
                )
                results["is_valid"] = False

        # Check for path safety
        if not _is_path_safe(filename):
            results["errors"].append(
                f"Filename '{filename}' contains unsafe characters"
            )
            results["is_valid"] = False

    except (OSError, PermissionError, ValueError) as e:
        results["errors"].append(f"Path validation error: {str(e)}")
        results["is_valid"] = False

    return results

suggest_export_alternatives(params)

Suggest alternative export configurations when file conflicts are detected.

Parameters:

Name Type Description Default
params Dict[str, Any]

Original export parameters

required

Returns:

Type Description
Dict[str, str]

Dictionary of suggested alternatives

Source code in climakitae/new_core/param_validation/export_param_validator.py
def suggest_export_alternatives(params: Dict[str, Any]) -> Dict[str, str]:
    """
    Suggest alternative export configurations when file conflicts are detected.

    Parameters
    ----------
    params : Dict[str, Any]
        Original export parameters

    Returns
    -------
    Dict[str, str]
        Dictionary of suggested alternatives
    """
    suggestions = {}

    filename = params.get("filename", "dataexport")
    file_format = params.get("file_format", "NetCDF").lower()
    extension_map = {"zarr": ".zarr", "netcdf": ".nc", "csv": ".csv.gz"}
    extension = extension_map.get(file_format, ".nc")

    # Suggest alternative filename
    alt_filename = _suggest_alternative_filename(filename, extension)
    suggestions["alternative_filename"] = alt_filename.replace(extension, "")

    # Suggest using skip_existing
    suggestions["skip_existing_method"] = (
        "Use export_method='skip_existing' to skip if files exist"
    )

    # Suggest different format
    other_formats = [
        fmt for fmt in ["NetCDF", "Zarr", "CSV"] if fmt.lower() != file_format
    ]
    if other_formats:
        suggestions["alternative_format"] = (
            f"Try a different format like {other_formats[0]}"
        )

    return suggestions

Processor Validators

Validator for parameters provided to Concat Processor.

validate_concat_param(value, **kwargs)

Validate the parameters provided to the Concat Processor.

Parameters:

Name Type Description Default
value str

The dimension name along which to concatenate datasets. Default: "sim"

required

Returns:

Type Description
bool

True if all parameters are valid, False otherwise

Source code in climakitae/new_core/param_validation/concat_param_validator.py
@register_processor_validator("concat")
def validate_concat_param(value: str, **kwargs: Any) -> bool:  # noqa: ARG001
    """Validate the parameters provided to the Concat Processor.

    Parameters
    ----------
    value : str
        The dimension name along which to concatenate datasets.
        Default: "sim"

    Returns
    -------
    bool
        True if all parameters are valid, False otherwise

    """
    logger.debug("validate_concat_param called with value: %s", value)

    if not isinstance(value, str):
        msg = "Concat Processor expects a string value for dimension name. Please check the configuration."
        logger.warning(msg)
        return False

    if not value.strip():
        msg = "Concat Processor dimension name cannot be empty. Please provide a valid dimension name."
        logger.warning(msg)
        return False

    return True  # All parameters are valid

Validator for parameters provided to MetricCalc Processor.

validate_metric_calc_param(value, **kwargs)

Validate the parameters provided to the MetricCalc Processor.

Parameters:

Name Type Description Default
value dict[str, Any]

Configuration dictionary with the following supported keys:

Basic Metrics: - metric (str, optional): Metric to calculate. Supported values: "min", "max", "mean", "median". Default: "mean" - percentiles (list, optional): List of percentiles to calculate (0-100). Default: None - percentiles_only (bool, optional): If True and percentiles are specified, only calculate percentiles (skip metric). Default: False - dim (str or list, optional): Dimension(s) along which to calculate the metric/percentiles. Default: "time" - skipna (bool, optional): Whether to skip NaN values in calculations. Default: True

required

Returns:

Type Description
bool

True if all parameters are valid, False otherwise

Source code in climakitae/new_core/param_validation/metric_calc_param_validator.py
@register_processor_validator("metric_calc")
def validate_metric_calc_param(
    value: dict[str, Any], **kwargs: Any
) -> bool:  # noqa: ARG001
    """
    Validate the parameters provided to the MetricCalc Processor.

    Parameters
    ----------
    value : dict[str, Any]
        Configuration dictionary with the following supported keys:

        Basic Metrics:
        - metric (str, optional): Metric to calculate. Supported values:
          "min", "max", "mean", "median". Default: "mean"
        - percentiles (list, optional): List of percentiles to calculate (0-100).
          Default: None
        - percentiles_only (bool, optional): If True and percentiles are specified,
          only calculate percentiles (skip metric). Default: False
        - dim (str or list, optional): Dimension(s) along which to calculate the metric/percentiles.
          Default: "time"
        - skipna (bool, optional): Whether to skip NaN values in calculations. Default: True

    Returns
    -------
    bool
        True if all parameters are valid, False otherwise
    """
    if not isinstance(value, dict):
        logger.warning(
            "\n\nMetricCalc Processor expects a dictionary configuration. "
            "\nPlease check the configuration."
        )
        return False

    # Extract parameters for validation
    metric = value.get("metric", "mean")
    percentiles = value.get("percentiles", UNSET)
    percentiles_only = value.get("percentiles_only", False)
    dim = value.get("dim", "time")
    skipna = value.get("skipna", True)

    if not _validate_basic_metric_parameters(
        metric, percentiles, percentiles_only, dim, skipna
    ):
        return False

    one_in_x_config = value.get("one_in_x", UNSET)
    thresholds_config = value.get("thresholds", UNSET)

    if one_in_x_config is not UNSET and thresholds_config is not UNSET:
        logger.warning("\n\nCannot set both 'thresholds' and 'one_in_x'. Choose one.")
        return False

    if thresholds_config is not UNSET and "metric" in value:
        logger.warning("\n\nCannot set both 'thresholds' and 'metric'. Choose one.")
        return False

    if thresholds_config is not UNSET and percentiles is not UNSET:
        logger.warning(
            "\n\nCannot set both 'thresholds' and 'percentiles'. Choose one."
        )
        return False

    if one_in_x_config is not UNSET:
        if not _validate_one_in_x_parameters(one_in_x_config):
            return False

    if thresholds_config is not UNSET:
        if not _validate_threshold_parameters(thresholds_config):
            return False

    return True  # All parameters are valid

Validator for parameters provided to Convert Units Processor.

validate_convert_units_param(value, **kwargs)

Validate the parameters provided to the Convert Units Processor.

This function checks the value provided to the Convert Units Processor and ensures that it meets the expected criteria. Will raise a user warning and return false if the value is not valid.

Parameters:

Name Type Description Default
value str | Iterable[str]

The unit(s) to convert to. Can be a single unit string or an iterable of unit strings. Valid units include temperature units (K, degC, degF), pressure units (Pa, hPa, mb, inHg), wind units (m/s, m s-1, mph, knots), precipitation units (mm, mm/d, mm/h, inches, inches/d, inches/h), moisture units (kg/kg, kg kg-1, g/kg, g kg-1), flux units (kg m-2 s-1), and relative humidity units ([0 to 100], fraction).

required

Returns:

Type Description
bool

True if all parameters are valid, False otherwise

Source code in climakitae/new_core/param_validation/convert_units_param_validator.py
@register_processor_validator("convert_units")
def validate_convert_units_param(
    value: str | Iterable[str],
    **kwargs: Any,
) -> bool:
    """
    Validate the parameters provided to the Convert Units Processor.

    This function checks the value provided to the Convert Units Processor and ensures that it
    meets the expected criteria. Will raise a user warning and return false if the value
    is not valid.

    Parameters
    ----------
    value : str | Iterable[str]
        The unit(s) to convert to. Can be a single unit string or an iterable of unit strings.
        Valid units include temperature units (K, degC, degF), pressure units (Pa, hPa, mb, inHg),
        wind units (m/s, m s-1, mph, knots), precipitation units (mm, mm/d, mm/h, inches, inches/d, inches/h),
        moisture units (kg/kg, kg kg-1, g/kg, g kg-1), flux units (kg m-2 s-1), and relative humidity
        units ([0 to 100], fraction).

    Returns
    -------
    bool
        True if all parameters are valid, False otherwise
    """
    # kwargs unused but required for signature compatibility
    del kwargs

    if value is UNSET:
        # UNSET is valid - processor will not perform any conversion
        return True

    if not _check_input_types(value):
        return False

    if not _check_unit_validity(value):
        return False

    return True

Parameter validator for StationBiasCorrection processor.

This module provides validation for parameters used with the StationBiasCorrection processor, which applies Quantile Delta Mapping (QDM) bias correction to gridded climate data using historical weather station observations.

The validator ensures: - Valid station selection from available HadISD stations - Proper time slice specification for bias correction - Valid QDM parameters (window, nquantiles, group, kind) - Station metadata availability - Compatibility between selected stations and data variables

Functions:

Name Description
validate_station_bias_correction_param

Main validation function for station bias correction parameters.

Examples:

>>> # Valid station bias correction parameters
>>> params = {
...     "stations": ["Sacramento (KSAC)", "San Francisco (KSFO)"],
...     "time_slice": (2030, 2060),
...     "window": 90,
...     "nquantiles": 20
... }
>>> validate_station_bias_correction_param(params)
True
>>> # Invalid station name
>>> params = {"stations": ["InvalidStation"], "time_slice": (2030, 2060)}
>>> validate_station_bias_correction_param(params)
False
Notes
  • Station observational data is available through 2014-08-31
  • Bias correction requires historical period (1980-2014) in input data
  • Currently only supports temperature (tas/tasmax/tasmin) bias correction

validate_bias_correction_station_data_param(value, query=None, **kwargs)

Validate parameters for StationBiasCorrection processor.

This function validates all parameters required for station bias correction: - Station selection (must exist in HadISD dataset) - Historical slice (optional, must be valid years if provided) - QDM parameters (window, nquantiles, group, kind) - Variable compatibility (currently only temperature variables supported)

Parameters:

Name Type Description Default
value Any

Dictionary containing station bias correction parameters. Expected keys: - stations: list[str] - Station names or codes (REQUIRED) - historical_slice: tuple[int, int], optional - Historical training period (default: (1980, 2014)) - window: int, optional - Seasonal grouping window (default: 90) - nquantiles: int, optional - Number of quantiles (default: 20) - group: str, optional - Temporal grouping (default: "time.dayofyear") - kind: str, optional - Adjustment kind (default: "+")

required
query Dict[str, Any]

Full query dictionary for cross-validation with other parameters. Used to check variable compatibility.

None
**kwargs Any

Additional keyword arguments (unused, for interface compatibility).

{}

Returns:

Type Description
bool

True if parameters are valid, False otherwise.

Raises:

Type Description
ValueError

If parameters are invalid with specific error messages.

TypeError

If parameter types are incorrect.

Examples:

>>> params = {
...     "stations": ["Sacramento (KSAC)"],
...     "window": 90
... }
>>> validate_station_bias_correction_param(params)
True
Source code in climakitae/new_core/param_validation/bias_adjust_model_to_station_param_validator.py
@register_processor_validator("bias_adjust_model_to_station")
def validate_bias_correction_station_data_param(
    value: Any,
    query: Dict[str, Any] | None = None,
    **kwargs: Any,  # noqa: ARG001
) -> bool:
    """Validate parameters for StationBiasCorrection processor.

    This function validates all parameters required for station bias correction:
    - Station selection (must exist in HadISD dataset)
    - Historical slice (optional, must be valid years if provided)
    - QDM parameters (window, nquantiles, group, kind)
    - Variable compatibility (currently only temperature variables supported)

    Parameters
    ----------
    value : Any
        Dictionary containing station bias correction parameters. Expected keys:
        - stations: list[str] - Station names or codes (REQUIRED)
        - historical_slice: tuple[int, int], optional - Historical training period (default: (1980, 2014))
        - window: int, optional - Seasonal grouping window (default: 90)
        - nquantiles: int, optional - Number of quantiles (default: 20)
        - group: str, optional - Temporal grouping (default: "time.dayofyear")
        - kind: str, optional - Adjustment kind (default: "+")
    query : Dict[str, Any], optional
        Full query dictionary for cross-validation with other parameters.
        Used to check variable compatibility.
    **kwargs : Any
        Additional keyword arguments (unused, for interface compatibility).

    Returns
    -------
    bool
        True if parameters are valid, False otherwise.

    Raises
    ------
    ValueError
        If parameters are invalid with specific error messages.
    TypeError
        If parameter types are incorrect.

    Examples
    --------
    >>> params = {
    ...     "stations": ["Sacramento (KSAC)"],
    ...     "window": 90
    ... }
    >>> validate_station_bias_correction_param(params)
    True
    """
    logger.debug("validate_station_bias_correction_param called with value: %s", value)

    # Handle None or UNSET values
    if value is None or value is UNSET:
        msg = (
            "Station bias correction parameters cannot be None. "
            "Please provide a dictionary with 'stations' key."
        )
        logger.warning(msg)
        return False

    # Validate it's a dictionary
    if not isinstance(value, dict):
        msg = (
            f"Station bias correction parameters must be a dictionary, "
            f"got {type(value).__name__}. "
            f"Example: {{'stations': ['Sacramento (KSAC)']}}"
        )
        logger.warning(msg)
        return False

    # Check if query is provided early - needed for cross-validation
    # This MUST come before station validation to avoid accessing DataCatalog
    # when we know the query is invalid
    if query is None:
        logger.debug("No query provided for cross-validation")
        return False

    # Validate required keys
    required_keys = ["stations"]
    missing_keys = [key for key in required_keys if key not in value]
    if missing_keys:
        msg = (
            f"Missing required parameter(s): {', '.join(missing_keys)}. "
            f"Station bias correction requires 'stations' (list of station names)."
        )
        logger.warning(msg)
        return False

    # Validate stations parameter (only after confirming query is not None)
    if not _validate_stations(value["stations"]):
        return False

    # Validate historical_slice parameter if provided
    if "historical_slice" in value and not _validate_historical_slice(
        value["historical_slice"]
    ):
        return False

    # Validate optional QDM parameters if provided
    if "window" in value and not _validate_window(value["window"]):
        return False

    if "nquantiles" in value and not _validate_nquantiles(value["nquantiles"]):
        return False

    if "group" in value and not _validate_group(value["group"]):
        return False

    if "kind" in value and not _validate_kind(value["kind"]):
        return False

    # Cross-validate with query (already checked for None earlier)
    if not _validate_catalog_requirement(query):
        return False
    if not _validate_variable_compatibility(query):
        return False
    if not _validate_timescale_requirement(query):
        return False
    if not _validate_downscaling_method_requirement(query):
        return False
    if not _validate_resolution_requirement(query):
        return False
    if not _validate_scenario_resolution_compatibility(query):
        return False
    if not _validate_institution_id_requirement(query):
        return False

    logger.info(
        "Station bias correction parameters validated successfully for %d station(s)",
        len(value["stations"]),
    )
    return True

Validator for parameters provided to FilterunadjustedModels Processor.

validate_filter_unadjusted_models_param(value, **kwargs)

Validate the parameters provided to the FilterUnadjustedModels Processor.

Parameters:

Name Type Description Default
value str

The value to control filtering behavior. Supported values: "yes" (default): Filter out unadjusted models "no": Include unadjusted models

required

Returns:

Type Description
bool

True if all parameters are valid, False otherwise

Source code in climakitae/new_core/param_validation/filter_unadjusted_models_param_validator.py
@register_processor_validator("filter_unadjusted_models")
def validate_filter_unadjusted_models_param(
    value: str, **kwargs: Any
) -> bool:  # noqa: ARG001
    """Validate the parameters provided to the FilterUnadjustedModels Processor.

    Parameters
    ----------
    value : str
        The value to control filtering behavior. Supported values:
        "yes" (default): Filter out unadjusted models
        "no": Include unadjusted models

    Returns
    -------
    bool
        True if all parameters are valid, False otherwise

    """
    # Module logger
    logger = logging.getLogger(__name__)

    if not isinstance(value, str):
        msg = (
            "\n\nFilterunadjustedModels Processor expects a string value. "
            "\nPlease check the configuration."
        )
        logger.warning(msg)
        return False

    valid_values = ["yes", "no"]

    if value not in valid_values:
        msg = (
            f"\n\nInvalid value '{value}' for FilterUnadjustedModels Processor. "
            f"\nSupported values are: {valid_values}"
        )
        logger.warning(msg)
        return False

    return True  # All parameters are valid

Validator for parameters provided to DropLeapDays Processor.

validate_drop_leap_days_param(value, **kwargs)

Validate the parameters provided to the DropLeapDays Processor.

Parameters:

Name Type Description Default
value str

The value to control leap day dropping behavior. Supported values: "yes" (default): Drop leap days (February 29) "no": Keep leap days

required

Returns:

Type Description
bool

True if all parameters are valid, False otherwise

Source code in climakitae/new_core/param_validation/drop_leap_days_param_validator.py
@register_processor_validator("drop_leap_days")
def validate_drop_leap_days_param(value: str, **kwargs: Any) -> bool:  # noqa: ARG001
    """Validate the parameters provided to the DropLeapDays Processor.

    Parameters
    ----------
    value : str
        The value to control leap day dropping behavior. Supported values:
        "yes" (default): Drop leap days (February 29)
        "no": Keep leap days

    Returns
    -------
    bool
        True if all parameters are valid, False otherwise

    """
    # Module logger
    logger = logging.getLogger(__name__)

    if not isinstance(value, str):
        msg = (
            "\n\nDropLeapDays Processor expects a string value. "
            "\nPlease check the configuration."
        )
        logger.warning(msg)
        return False

    valid_values = ["yes", "no"]

    if value.lower() not in valid_values:
        msg = (
            f"\n\nInvalid value '{value}' for DropLeapDays Processor. "
            f"\nSupported values are: {valid_values}"
        )
        logger.warning(msg)
        return False

    return True  # All parameters are valid

Validator for parameters provided to Convert to Local Time processor.

validate_convert_to_local_time_param(value, **kwargs)

Validate the parameters provided to the ConvertToLocalTime Processor.

Parameters:

Name Type Description Default
value Union[str, list, dict[str, Any]]

The configuration dictionary to validate. Expected keys: - convert : str The value to control leap day dropping behavior. Supported values: "yes": Convert time to local time "no" (default): Keep original timezone - reindex_time_axis : str Set to "yes" to fix time stamps affected by daylight savings time. Default value is "no".

required

Returns:

Type Description
bool

True if all parameters are valid, False otherwise

Source code in climakitae/new_core/param_validation/convert_to_local_time_param_validator.py
@register_processor_validator("convert_to_local_time")
def validate_convert_to_local_time_param(
    value: Dict[str, Any], **kwargs: Any
) -> bool:  # noqa: ARG001
    """Validate the parameters provided to the ConvertToLocalTime Processor.

    Parameters
    ----------
    value : Union[str, list, dict[str, Any]]
        The configuration dictionary to validate. Expected keys:
        - convert : str
            The value to control leap day dropping behavior. Supported values:
            "yes": Convert time to local time
            "no" (default): Keep original timezone
        - reindex_time_axis : str
            Set to "yes" to fix time stamps affected by daylight savings time.
            Default value is "no".

    Returns
    -------
    bool
        True if all parameters are valid, False otherwise

    """
    valid_values = ["yes", "no"]

    for setting in value:
        if not isinstance(value[setting], str):
            msg = (
                "\nConvertToLocalTime Processor expects string values. "
                "\nPlease check the configuration."
            )
            logger.warning(msg)
            return False

        if value[setting] not in valid_values:
            msg = (
                f"\n\nInvalid value '{value[setting]}' for ConvertToLocalTime Processor '{setting}' setting. "
                f"\nSupported values are: {valid_values}"
            )
            logger.warning(msg)
            return False

    return True  # All parameters are valid

Alternative Catalog Validators

Validator for renewable energy dataset parameters.

RenewablesValidator(catalog)

Bases: ParameterValidator

Validator for renewable energy dataset parameters.

Parameters:

Name Type Description Default
catalog str

path to the renewables dataset catalog

required

Initialize with catalog of renewable energy datasets.

Parameters:

Name Type Description Default
catalog DataCatalog

Catalog of datasets

required
Source code in climakitae/new_core/param_validation/renewables_param_validator.py
def __init__(self, catalog: DataCatalog):
    """Initialize with  catalog of renewable energy datasets.

    Parameters
    ----------
    catalog : DataCatalog
        Catalog of datasets

    """
    super().__init__()
    self.all_catalog_keys = {
        "installation": UNSET,
        "activity_id": UNSET,
        "institution_id": UNSET,
        "source_id": UNSET,
        "experiment_id": UNSET,
        "table_id": UNSET,
        "grid_label": UNSET,
        "variable_id": UNSET,
    }
    self.catalog = catalog.renewables
    self.invalid_processors = []

    logger.debug("RenewablesValidator initialized for renewables catalog")

get_default_processors(query)

Get default processors for renewables catalog.

Renewables data uses the same defaults as CADCAT since it's also climate model data.

Parameters:

Name Type Description Default
query Dict[str, Any]

The current query containing user parameters

required

Returns:

Type Description
Dict[str, Any]

Dictionary mapping processor names to their default configurations

Source code in climakitae/new_core/param_validation/renewables_param_validator.py
def get_default_processors(self, query: Dict[str, Any]) -> Dict[str, Any]:
    """Get default processors for renewables catalog.

    Renewables data uses the same defaults as CADCAT since it's also
    climate model data.

    Parameters
    ----------
    query : Dict[str, Any]
        The current query containing user parameters

    Returns
    -------
    Dict[str, Any]
        Dictionary mapping processor names to their default configurations
    """
    defaults = super().get_default_processors(query)

    # Add default filtering for climate model data
    defaults["filter_unadjusted_models"] = "yes"

    # Drop leap days by default
    defaults["drop_leap_days"] = "yes"

    # Set default concatenation
    concat_dim = "time"

    # if experiment_id is a string, check if it contains "historical"
    experiment_id = query.get("experiment_id", UNSET)
    match experiment_id:
        case str():
            if (
                "historical" in experiment_id.lower()
                or "reanalysis" in experiment_id.lower()
            ):
                # if it does, we can use "sim" as the default concat dimension
                concat_dim = "sim"
        case list() | tuple():
            # if experiment_id is a list or tuple, check each element
            # if there are no elements with "ssp" in them then we use the sim approach
            if not any("ssp" in str(item).lower() for item in experiment_id):
                concat_dim = "sim"

    defaults["concat"] = concat_dim
    return defaults

Validator for historical data platform parameters.

HDPValidator(catalog)

Bases: ParameterValidator

Validator for historical data platform parameters.

This validator enforces that queries must specify a single network_id to prevent mixing data from different weather station networks, which may have different time periods and data characteristics.

Parameters:

Name Type Description Default
catalog DataCatalog

Catalog of datasets

required

Attributes:

Name Type Description
catalog esm_datastore

The HDP catalog for validation

all_catalog_keys dict

Required query parameters with default values

Initialize with catalog of historical data platform datasets.

Parameters:

Name Type Description Default
catalog DataCatalog

Catalog of datasets

required
Source code in climakitae/new_core/param_validation/hdp_param_validator.py
def __init__(self, catalog: DataCatalog):
    """Initialize with catalog of historical data platform datasets.

    Parameters
    ----------
    catalog : DataCatalog
        Catalog of datasets

    """
    super().__init__()
    self.all_catalog_keys = {
        "network_id": UNSET,
        "station_id": UNSET,
    }
    self.catalog = catalog.hdp
    self.invalid_processors = [
        "localize",
        "clip",
        "convert_units",
        "bias_adjust_model_to_station",
        "filter_unadjusted_models",
        "metric_calc",
        "warming_level",
    ]
    logger.debug("HDPValidator initialized for hdp catalog")

get_default_processors(query)

Get default processors for HDP catalog.

HDP data uses station_id for concatenation since it's station-based data.

Parameters:

Name Type Description Default
query Dict[str, Any]

The current query containing user parameters

required

Returns:

Type Description
Dict[str, Any]

Dictionary mapping processor names to their default configurations

Source code in climakitae/new_core/param_validation/hdp_param_validator.py
def get_default_processors(self, query: Dict[str, Any]) -> Dict[str, Any]:
    """Get default processors for HDP catalog.

    HDP data uses station_id for concatenation since it's station-based data.

    Parameters
    ----------
    query : Dict[str, Any]
        The current query containing user parameters

    Returns
    -------
    Dict[str, Any]
        Dictionary mapping processor names to their default configurations
    """
    defaults = super().get_default_processors(query)  # Get universal defaults
    defaults.update(
        {
            "concat": "station_id",
        }
    )
    return defaults

is_valid_query(query)

Validate query parameters for HDP data.

Requires network_id and ensures it's a single value. Station_id is optional.

Parameters:

Name Type Description Default
query dict

Query parameters to validate

required

Returns:

Type Description
dict or None

Validated query if valid, None otherwise

Notes

Validation checks performed:

  1. network_id is required and must be a single value (accepts string or single-item list, rejects multi-item lists)
  2. station_id is optional and can be used to filter within the network
  3. If station_id is provided, all requested station IDs must exist in the catalog

Multiple network_ids are not allowed to prevent mixing data from different networks with potentially different time periods and data characteristics.

Source code in climakitae/new_core/param_validation/hdp_param_validator.py
def is_valid_query(self, query: Dict[str, Any]) -> Dict[str, Any] | None:
    """Validate query parameters for HDP data.

    Requires network_id and ensures it's a single value. Station_id is optional.

    Parameters
    ----------
    query : dict
        Query parameters to validate

    Returns
    -------
    dict or None
        Validated query if valid, None otherwise

    Notes
    -----
    Validation checks performed:

    1. network_id is required and must be a single value
       (accepts string or single-item list, rejects multi-item lists)
    2. station_id is optional and can be used to filter within the network
    3. If station_id is provided, all requested station IDs must exist in the catalog

    Multiple network_ids are not allowed to prevent mixing data from
    different networks with potentially different time periods and
    data characteristics.

    """
    logger.debug("Validating HDP query: %s", query)

    # Check that network_id is provided and is a single value
    initial_checks = [
        self._check_network_id_required(query),
        self._check_station_ids_exist(query),
        self._check_query_invalid_processors(query),
    ]
    if not all(initial_checks):
        logger.warning("Initial validation checks failed")
        return None

    result = super()._is_valid_query(query)
    logger.info("HDP query validation result: %s", bool(result))
    return result

Utilities

Tools for validating user input