`ClimateData` class

The ClimateData class is the main entry point for the ClimateData interface. It exposes the fluent / builder API used to assemble climate-data queries.

API reference for climakitae.new_core.user_interface.

ClimateData

A fluent interface for accessing climate data.

This class provides a chainable interface for setting parameters and retrieving climate data. It uses a factory pattern to create datasets and validators based on the specified parameters. The class is designed to be chainable, allowing users to set multiple parameters in a single expression.

The interface supports various climate data sources and allows for flexible querying with different combinations of parameters. All methods return the instance itself to enable method chaining.

Other Parameters:

Name	Type	Description
`catalog`	`str`	The data catalog to use (e.g., `"renewable energy generation"`, `"cadcat"`).
`installation`	`str`	The installation type (e.g., `"pv_utility"`, `"wind_offshore"`).
`activity_id`	`str`	The activity identifier (e.g., `"WRF"`, `"LOCA2"`).
`institution_id`	`str`	The institution identifier (e.g., `"CNRM"`, `"DWD"`).
`source_id`	`str`	The source identifier (e.g., `"GCM"`, `"RCM"`, `"Station"`).
`experiment_id`	`str or list of str`	The experiment identifier (e.g., `"historical"`, `"ssp245"`).
`table_id`	`str`	The temporal resolution (e.g., `"1hr"`, `"day"`, `"mon"`).
`grid_label`	`str`	The spatial resolution (e.g., `"d01"`, `"d02"`, `"d03"`).
`variable_id`	`str`	The climate variable (e.g., `"tasmax"`, `"pr"`, `"cf"`).
`processes`	`dict`	Dictionary of data processing operations to apply.

Methods:

Name	Description
`verbosity`	Set the logging verbosity level.
`catalog`	Set the data catalog to use.
`installation`	Set the installation type.
`activity_id`	Set the activity identifier.
`institution_id`	Set the institution identifier.
`source_id`	Set the source identifier.
`experiment_id`	Set the experiment identifier(s).
`table_id`	Set the temporal resolution.
`grid_label`	Set the spatial resolution.
`variable`	Set the climate variable to retrieve.
`station_id`	Set the station identifier
`network_id`	Set the network identifier
`processes`	Set processing operations to apply to the data.
`get`	Execute the query and retrieve the climate data.
`show_query`	Display the current query configuration.
`show_catalog_options`	Display available catalog options.
`show_installation_options`	Display available installation options.
`show_activity_id_options`	Display available activity ID options.
`show_institution_id_options`	Display available institution ID options.
`show_source_id_options`	Display available source ID options.
`show_experiment_id_options`	Display available experiment ID options.
`show_table_id_options`	Display available table ID (temporal resolution) options.
`show_grid_label_options`	Display available grid label (spatial resolution) options.
`show_variable_options`	Display available climate variable options.
`show_station_id_options`	Display available station ID options.
`show_network_id_options`	Display available network ID options.
`show_derived_variables`	Display registered derived variables.
`show_processors`	Display registered data processors.
`show_station_options`	Display available weather station options.
`show_boundary_options`	Display available boundary options; pass a boundary type to list sub-options.
`show_all_options`	Display all available options for exploration.

Returns:

Type	Description
`DataArray or None`	The retrieved climate data as a lazy-loaded xarray DataArray, or None if the query fails or required parameters are missing.

Raises:

Type	Description
`ValueError`	If required parameters are missing or invalid during validation.
`Exception`	If there is an error during data retrieval or processing.

Examples:

Basic usage with method chaining:

>>> cd = ClimateData()
>>> data = (cd
...     .catalog("cadcat")
...     .activity_id("WRF")
...     .experiment_id("historical")
...     .table_id("1hr")
...     .grid_label("d02")
...     .variable("prec")
...     .get()
...    )

Exploring available options:

>>> cd = ClimateData()
>>> cd.show_catalog_options()
>>> cd.catalog("cadcat").show_variable_options()

Using with processing:

>>> processes = {"spatial_avg": "region", "temporal_avg": "monthly"}
>>> data = (ClimateData()
...         .catalog("climate")
...         .variable("pr")
...         .processes(processes)
...         .get())

Initialize the ClimateData interface.

Sets up the factory for dataset creation and initializes query parameters to their default (UNSET) state. Optionally configures logging to file or stdout.

Parameters:

Name	Type	Description	Default
`log_file`	`str`	Path to log file. If None, logs to stdout. Default is None.	`None`
`verbosity`	`int`	Logging verbosity level: - <= -2: Effectively silent (no logs) - -1: WARNING level - 0: INFO level (default) - > 0: DEBUG level Default is 0.	`0`

Source code in climakitae/new_core/user_interface.py

def __init__(self, log_file: Optional[str] = None, verbosity: int = 0):
    """Initialize the ClimateData interface.

    Sets up the factory for dataset creation and initializes
    query parameters to their default (UNSET) state. Optionally
    configures logging to file or stdout.

    Parameters
    ----------
    log_file : str, optional
        Path to log file. If None, logs to stdout. Default is None.
    verbosity : int, optional
        Logging verbosity level:
        - <= -2: Effectively silent (no logs)
        - -1: WARNING level
        - 0: INFO level (default)
        - > 0: DEBUG level
        Default is 0.

    """
    # Configure logging
    self._log_file = log_file
    self._verbosity = verbosity
    self._configure_logging()

    try:
        logger.info("Initializing ClimateData interface")
        self._factory = DatasetFactory()
        self._reset_query()
        self.var_desc = read_csv_file(VARIABLE_DESCRIPTIONS_CSV_PATH)
        logger.info("ClimateData initialization successful")
        logger.info("✅ Ready to query!")
    except Exception as e:
        logger.error("❌ Setup failed: %s", str(e), exc_info=True)
        return

`verbosity(level)`

Set the logging verbosity level.

This method allows dynamic adjustment of logging verbosity and supports method chaining.

Parameters:

Name	Type	Description	Default
`level`	`int`	Logging verbosity mapping: - <= -2: effectively silent (no logs) - -1: WARNING level - 0: INFO level (default) - >0: DEBUG level (user must specify >0 to get debug)	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Examples:

>>> cd = ClimateData()
>>> cd.verbosity(-1)  # warnings only
>>> cd.verbosity(0)   # info (default)
>>> cd.verbosity(1)   # debug

Source code in climakitae/new_core/user_interface.py

def verbosity(self, level: int) -> "ClimateData":
    """Set the logging verbosity level.

    This method allows dynamic adjustment of logging verbosity
    and supports method chaining.

    Parameters
    ----------
    level : int
        Logging verbosity mapping:
        - <= -2: effectively silent (no logs)
        - -1: WARNING level
        - 0: INFO level (default)
        - >0: DEBUG level (user must specify >0 to get debug)

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    Examples
    --------
    >>> cd = ClimateData()
    >>> cd.verbosity(-1)  # warnings only
    >>> cd.verbosity(0)   # info (default)
    >>> cd.verbosity(1)   # debug

    """
    if not isinstance(level, int):
        raise ValueError("Verbosity level must be an integer")

    logger.debug("Setting verbosity level to %d", level)
    self._verbosity = level
    self._configure_logging()
    logger.info("Verbosity level set to %d", level)
    return self

`catalog(catalog)`

Set the data catalog to use for the query.

Parameters:

Name	Type	Description	Default
`catalog`	`str`	The name of the catalog (e.g., "renewables", "climate").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def catalog(self, catalog: str) -> "ClimateData":
    """Set the data catalog to use for the query.

    Parameters
    ----------
    catalog : str
        The name of the catalog (e.g., "renewables", "climate").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting catalog to: %s", catalog)
    if not isinstance(catalog, str) or not catalog.strip():
        logger.error("Invalid catalog parameter: must be non-empty string")
        raise ValueError("Catalog must be a non-empty string")
    self._query["catalog"] = catalog.strip()
    logger.info("Catalog set to: %s", catalog.strip())
    return self

`installation(installation)`

Set the installation type for the query.

Parameters:

Name	Type	Description	Default
`installation`	`str`	The installation type (e.g., "pv_utility", "wind_offshore").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def installation(self, installation: str) -> "ClimateData":
    """Set the installation type for the query.

    Parameters
    ----------
    installation : str
        The installation type (e.g., "pv_utility", "wind_offshore").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting installation to: %s", installation)
    if not isinstance(installation, str) or not installation.strip():
        logger.error("Invalid installation parameter: must be non-empty string")
        raise ValueError("Installation must be a non-empty string")
    self._query["installation"] = installation.strip()
    logger.info("Installation set to: %s", installation.strip())
    return self

`activity_id(activity_id)`

Set the activity identifier for the query.

Parameters:

Name	Type	Description	Default
`activity_id`	`str`	The activity ID (e.g., "CMIP6", "CORDEX").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def activity_id(self, activity_id: str) -> "ClimateData":
    """Set the activity identifier for the query.

    Parameters
    ----------
    activity_id : str
        The activity ID (e.g., "CMIP6", "CORDEX").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting activity_id to: %s", activity_id)
    if not isinstance(activity_id, str) or not activity_id.strip():
        logger.error("Invalid activity_id parameter: must be non-empty string")
        raise ValueError("Activity ID must be a non-empty string")
    self._query["activity_id"] = activity_id.strip()
    logger.info("Activity ID set to: %s", activity_id.strip())
    return self

`institution_id(institution_id)`

Set the institution identifier for the query.

Parameters:

Name	Type	Description	Default
`institution_id`	`str`	The institution ID (e.g., "CNRM", "DWD").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def institution_id(self, institution_id: str) -> "ClimateData":
    """Set the institution identifier for the query.

    Parameters
    ----------
    institution_id : str
        The institution ID (e.g., "CNRM", "DWD").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting institution_id to: %s", institution_id)
    if not isinstance(institution_id, str) or not institution_id.strip():
        logger.error("Invalid institution_id parameter: must be non-empty string")
        raise ValueError("Institution ID must be a non-empty string")
    self._query["institution_id"] = institution_id.strip()
    logger.info("Institution ID set to: %s", institution_id.strip())
    return self

`source_id(source_id)`

Set the source identifier for the query.

Parameters:

Name	Type	Description	Default
`source_id`	`str`	The source ID (e.g., "GCM", "RCM", "Station").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def source_id(self, source_id: str) -> "ClimateData":
    """Set the source identifier for the query.

    Parameters
    ----------
    source_id : str
        The source ID (e.g., "GCM", "RCM", "Station").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting source_id to: %s", source_id)
    if not isinstance(source_id, str) or not source_id.strip():
        logger.error("Invalid source_id parameter: must be non-empty string")
        raise ValueError("Source ID must be a non-empty string")
    self._query["source_id"] = source_id.strip()
    logger.info("Source ID set to: %s", source_id.strip())
    return self

`experiment_id(experiment_id)`

Set the experiment identifier for the query.

Parameters:

Name	Type	Description	Default
`experiment_id`	`str or list of str`	The experiment ID (e.g., "historical", "ssp245") or a list of experiment IDs to query multiple scenarios at once.	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Examples:

>>> cd.experiment_id("ssp245")  # Single experiment
>>> cd.experiment_id(["historical", "ssp245", "ssp370"])  # Multiple

Source code in climakitae/new_core/user_interface.py

def experiment_id(self, experiment_id: str | list[str]) -> "ClimateData":
    """Set the experiment identifier for the query.

    Parameters
    ----------
    experiment_id : str or list of str
        The experiment ID (e.g., "historical", "ssp245") or a list of
        experiment IDs to query multiple scenarios at once.

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    Examples
    --------
    >>> cd.experiment_id("ssp245")  # Single experiment
    >>> cd.experiment_id(["historical", "ssp245", "ssp370"])  # Multiple

    """
    logger.debug("Setting experiment_id to: %s", experiment_id)
    exp = []
    if not isinstance(experiment_id, (str, list)):
        logger.error(
            "Invalid experiment_id parameter: must be string or list of strings"
        )
        raise ValueError(
            "Experiment ID must be a non-empty string or list of strings"
        )
    if isinstance(experiment_id, str):
        if not experiment_id.strip():
            logger.error("Invalid experiment_id parameter: empty string")
            raise ValueError("Experiment ID must be a non-empty string")
        exp.append(experiment_id.strip())
    else:
        for exp_id in experiment_id:
            if not isinstance(exp_id, str) or not exp_id.strip():
                logger.error(
                    "Invalid experiment_id in list: must be non-empty strings"
                )
                raise ValueError("Each experiment ID must be a non-empty string")
            exp.append(exp_id.strip())
    self._query["experiment_id"] = exp
    logger.info("Experiment ID(s) set to: %s", exp)
    return self

`table_id(table_id)`

Set the temporal resolution identifier for the query.

Parameters:

Name	Type	Description	Default
`table_id`	`str`	The temporal resolution (e.g., "1hr", "day", "mon").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def table_id(self, table_id: str) -> "ClimateData":
    """Set the temporal resolution identifier for the query.

    Parameters
    ----------
    table_id : str
        The temporal resolution (e.g., "1hr", "day", "mon").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting table_id to: %s", table_id)
    if not isinstance(table_id, str) or not table_id.strip():
        logger.error("Invalid table_id parameter: must be non-empty string")
        raise ValueError("Table ID must be a non-empty string")
    self._query["table_id"] = table_id.strip()
    logger.info("Table ID set to: %s", table_id.strip())
    return self

`grid_label(grid_label)`

Set the spatial resolution identifier for the query.

Parameters:

Name	Type	Description	Default
`grid_label`	`str`	The spatial resolution (e.g., "d01", "d02", "d03").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def grid_label(self, grid_label: str) -> "ClimateData":
    """Set the spatial resolution identifier for the query.

    Parameters
    ----------
    grid_label : str
        The spatial resolution (e.g., "d01", "d02", "d03").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting grid_label to: %s", grid_label)
    if not isinstance(grid_label, str) or not grid_label.strip():
        logger.error("Invalid grid_label parameter: must be non-empty string")
        raise ValueError("Grid label must be a non-empty string")
    self._query["grid_label"] = grid_label.strip()
    logger.info("Grid label set to: %s", grid_label.strip())
    return self

`variable(variable)`

Set the climate variable to retrieve.

Parameters:

Name	Type	Description	Default
`variable`	`str`	The variable identifier (e.g., "tasmax", "pr", "cf"). Can also be a registered derived variable name.	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def variable(self, variable: str) -> "ClimateData":
    """Set the climate variable to retrieve.

    Parameters
    ----------
    variable : str
        The variable identifier (e.g., "tasmax", "pr", "cf").
        Can also be a registered derived variable name.

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting variable to: %s", variable)
    if not isinstance(variable, str) or not variable.strip():
        logger.error("Invalid variable parameter: must be non-empty string")
        raise ValueError("Variable must be a non-empty string")
    self._query["variable_id"] = variable.strip()
    logger.info("Variable set to: %s", variable.strip())
    return self

`derived_variable(name, depends_on, func, description='', units='', **query_extras)`

Register and query a user-defined derived variable.

This method registers a custom function that computes a new variable from existing source variables, then sets that variable as the query target. The computation happens during data loading (not as a post-processor).

Parameters:

Name	Type	Description	Default
`name`	`str`	Name for the new derived variable. This becomes queryable like any other variable in the catalog.	required
`depends_on`	`list of str`	List of source variable IDs required for the computation (e.g., ['tasmax', 'tasmin'] or ['t2', 'rh']).	required
`func`	`callable`	Function that takes an xarray.Dataset and returns a modified Dataset with the new variable added. The function signature should be: `func(ds: xr.Dataset) -> xr.Dataset`	required
`description`	`str`	Human-readable description of what this variable represents.	`''`
`units`	`str`	Expected units of the derived variable.	`''`
`**query_extras`		Additional query constraints (e.g., table_id='day').	`{}`

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Examples:

Define and query a custom temperature range variable:

>>> def calc_temp_range(ds):
...     ds['temp_range'] = ds.tasmax - ds.tasmin
...     ds['temp_range'].attrs = {'units': 'K', 'long_name': 'Daily Range'}
...     return ds
...
>>> data = (cd
...     .catalog("cadcat")
...     .activity_id("LOCA2")
...     .table_id("day")
...     .grid_label("d03")
...     .derived_variable(
...         name='temp_range',
...         depends_on=['tasmax', 'tasmin'],
...         func=calc_temp_range,
...         description='Daily temperature range',
...         units='K'
...     )
...     .get())

Notes

Registration is permanent for the Python session
The function must add the variable to the dataset and return it
Set appropriate attributes (units, long_name) on the new variable
For complex post-load transformations, use processors instead

`station_id(station_id)`

Set the station identifier for the query.

Parameters:

Name	Type	Description	Default
`station_id`	`str`	The station ID (e.g., "ASOSAWOS_72019300117", "ASOSAWOS_72020200118").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def station_id(self, station_id: str | list[str]) -> "ClimateData":
    """Set the station identifier for the query.

    Parameters
    ----------
    station_id : str
        The station ID (e.g., "ASOSAWOS_72019300117", "ASOSAWOS_72020200118").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting station_id to: %s", station_id)
    stn = []
    if not isinstance(station_id, (str, list)):
        logger.error(
            "Invalid station_id parameter: must be string or list of strings"
        )
        raise ValueError("Station ID must be a non-empty string or list of strings")
    if isinstance(station_id, str):
        if not station_id.strip():
            logger.error("Invalid station_id parameter: empty string")
            raise ValueError("Station ID must be a non-empty string")
        stn.append(station_id.strip())
    else:
        for id in station_id:
            if not isinstance(id, str) or not id.strip():
                logger.error(
                    "Invalid station_id in list: must be non-empty strings"
                )
                raise ValueError("Each station ID must be a non-empty string")
            stn.append(id.strip())
    self._query["station_id"] = stn
    logger.info("Station ID(s) set to: %s", stn)
    return self

`network_id(network_id)`

Set the network identifier for the query.

Parameters:

Name	Type	Description	Default
`network_id`	`str \| list[str]`	The network ID (e.g., "ASOSAWOS", "CWOP").	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def network_id(self, network_id: str | list[str]) -> "ClimateData":
    """Set the network identifier for the query.

    Parameters
    ----------
    network_id : str | list[str]
        The network ID (e.g., "ASOSAWOS", "CWOP").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting network_id to: %s", network_id)
    if not isinstance(network_id, (str, list)):
        logger.error(
            "Invalid network_id parameter: must be string or list of strings"
        )
        raise ValueError("Network ID must be a non-empty string or list of strings")
    if isinstance(network_id, str):
        if not network_id.strip():
            logger.error("Invalid network_id parameter: empty string")
            raise ValueError("Network ID must be a non-empty string")
        self._query["network_id"] = network_id.strip()
        logger.info("Network ID set to: %s", network_id.strip())
    else:
        net = []
        for id in network_id:
            if not isinstance(id, str) or not id.strip():
                logger.error(
                    "Invalid network_id in list: must be non-empty strings"
                )
                raise ValueError("Each network ID must be a non-empty string")
            net.append(id.strip())
        self._query["network_id"] = net
        logger.info("Network ID(s) set to: %s", net)
    return self

`processes(processes)`

Set processing operations to apply to the retrieved data.

Parameters:

Name	Type	Description	Default
`processes`	`Dict[str, Union[str, Iterable]]`	A dictionary of processing operations and their parameters.	required

Returns:

Type	Description
`ClimateData`	The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py

def processes(self, processes: Dict[str, Union[str, Iterable]]) -> "ClimateData":
    """Set processing operations to apply to the retrieved data.

    Parameters
    ----------
    processes : Dict[str, Union[str, Iterable]]
        A dictionary of processing operations and their parameters.

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting processes to: %s", processes)
    if not isinstance(processes, dict):
        logger.error("Invalid processes parameter: must be a dictionary")
        raise ValueError("Processes must be a dictionary")
    self._query["processes"] = processes.copy()
    logger.info("Processes set: %d operations configured", len(processes))
    return self

`get()`

Execute the configured query and retrieve climate data.

Validates required parameters, creates the appropriate dataset using the factory pattern, executes the query, and resets the query state for the next use.

Thread Safety

This method takes a snapshot of the query at the start of execution, making it safe to call from multiple threads on the same ClimateData instance. However, for maximum clarity and safety, it is recommended to use separate ClimateData instances in multi-threaded scenarios.

Returns:

Type	Description
`Optional[DataArray]`	The retrieved climate data as a lazy-loaded xarray DataArray, or None if the query fails or validation errors occur.

Raises:

Type	Description
`ValueError`	If required parameters are missing during validation.
`Exception`	If there are errors during dataset creation or execution.

Source code in climakitae/new_core/user_interface.py

def get(self) -> Optional[Any]:
    """Execute the configured query and retrieve climate data.

    Validates required parameters, creates the appropriate dataset using
    the factory pattern, executes the query, and resets the query state
    for the next use.

    Thread Safety
    -------------
    This method takes a snapshot of the query at the start of execution,
    making it safe to call from multiple threads on the same ClimateData
    instance. However, for maximum clarity and safety, it is recommended
    to use separate ClimateData instances in multi-threaded scenarios.

    Returns
    -------
    Optional[xr.DataArray]
        The retrieved climate data as a lazy-loaded xarray DataArray,
        or None if the query fails or validation errors occur.

    Raises
    ------
    ValueError
        If required parameters are missing during validation.
    Exception
        If there are errors during dataset creation or execution.

    """
    logger.info("Starting data retrieval with query: %s", self._query)
    data = None

    # Take a snapshot of the query for thread-safety
    # This allows concurrent calls to get() without corrupting each other
    query_snapshot = copy.deepcopy(self._query)

    # Validate required parameters using the snapshot for thread-safety
    logger.debug("Validating required parameters")
    if not self._validate_required_parameters(query_snapshot):
        logger.warning("Required parameter validation failed")
        self._reset_query()
        return None

    try:
        # Create dataset using factory with the snapshot
        logger.debug("Creating dataset using factory")
        dataset = self._factory.create_dataset(query_snapshot)
        logger.info("Dataset created successfully")
    except (ValueError, KeyError, TypeError) as e:
        logger.error("Error during dataset creation: %s", str(e))
        logger.debug("Traceback:", exc_info=True)
        self._reset_query()
        return None

    try:
        # Execute the query with the snapshot
        logger.debug("Executing query")
        data = dataset.execute(query_snapshot)
        # check if empty dataset
        # Check if data is empty/null
        if (
            data is None
            or (hasattr(data, "nbytes") and data.nbytes == 0)
            or (isinstance(data, dict) and not data)
        ):
            logger.warning("⚠️ Warning: Retrieved dataset is empty.")

        else:
            logger.info("✅ Data retrieval successful!")

    except (ValueError, KeyError, IOError, RuntimeError) as e:
        logger.error("❌ Data retrieval failed: %s", str(e))
        logger.debug("Traceback:", exc_info=True)

    # Always reset query after execution
    self._reset_query()
    return data

`show_query()`

Display the current query configuration.

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_query(self) -> None:
    """Display the current query configuration."""
    msg = "Current Query:"
    logger.info(msg)
    logger.info("%s", "-" * len(msg))
    for key, value in self._query.items():
        display_value = value if value is not UNSET else "UNSET"
        logger.info("%s: %s", key, display_value)

`show_catalog_options(show_n=None)`

Display available catalog options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_catalog_options(self, show_n: Optional[int] = None) -> None:
    """Display available catalog options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "catalog",
        "catalog options (Cloud data collections)",
        limit_per_group=show_n,
    )

`show_installation_options(show_n=None)`

Display available installation options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_installation_options(self, show_n: Optional[int] = None) -> None:
    """Display available installation options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "installation",
        "installation options (Renewable energy generation types)",
        limit_per_group=show_n,
    )

`show_activity_id_options(show_n=None)`

Display available activity ID options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_activity_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available activity ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "activity_id",
        "activity_id options (Downscaling methods)",
        limit_per_group=show_n,
    )

`show_institution_id_options(show_n=None)`

Display available institution ID options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_institution_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available institution ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "institution_id",
        "institution_id options (Data producers)",
        limit_per_group=show_n,
    )

`show_source_id_options(show_n=None)`

Display available source ID options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_source_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available source ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "source_id",
        "source_id options (Climate model simulations)",
        limit_per_group=show_n,
    )

`show_experiment_id_options(show_n=None)`

Display available experiment ID options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_experiment_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available experiment ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "experiment_id",
        "experiment_id options (Simulation runs)",
        limit_per_group=show_n,
    )

`show_station_id_options(show_n=None)`

Display available station ID options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of stations to display. If None (default), shows all stations.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_station_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available station ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of stations to display. If None (default), shows all stations.
    """
    self._show_options(
        "station_id",
        "station_id options (Weather station names)",
        limit_per_group=show_n,
    )

`show_network_id_options(show_n=None)`

Display available network ID options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_network_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available network ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "network_id",
        "network_id options (Weather network names)",
        limit_per_group=show_n,
    )

`show_table_id_options(show_n=None)`

Display available table ID options (Temporal resolutions).

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_table_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available table ID options (Temporal resolutions).

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "table_id",
        "table_id options (Temporal resolutions)",
        limit_per_group=show_n,
    )

`show_grid_label_options(show_n=None)`

Display available grid label options (Spatial resolutions).

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_grid_label_options(self, show_n: Optional[int] = None) -> None:
    """Display available grid label options (Spatial resolutions).

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "grid_label",
        "grid_label options (Spatial resolutions)",
        limit_per_group=show_n,
    )

`show_variable_options(show_n=None)`

Display available variable options.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of options to display. If None (default), shows all options.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_variable_options(self, show_n: Optional[int] = None) -> None:
    """Display available variable options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    current_query = {k: v for k, v in self._query.items() if v is not UNSET}
    msg = ""
    if current_query:
        msg = "Variables (constrained by current query):"
    else:
        msg = "Variables"

    self._show_options("variable_id", msg, limit_per_group=show_n)

`show_derived_variables()`

Display all registered derived variables.

Shows both builtin and user-registered derived variables with their dependencies and descriptions.

Examples:

>>> cd = ClimateData()
>>> cd.show_derived_variables()
Derived Variables (computed from source variables during loading):
------------------------------------------------------------------
wind_speed_10m      depends on: u10, v10
heat_index          depends on: t2, rh
...

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_derived_variables(self) -> None:
    """Display all registered derived variables.

    Shows both builtin and user-registered derived variables with their
    dependencies and descriptions.

    Examples
    --------
    >>> cd = ClimateData()
    >>> cd.show_derived_variables()
    Derived Variables (computed from source variables during loading):
    ------------------------------------------------------------------
    wind_speed_10m      depends on: u10, v10
    heat_index          depends on: t2, rh
    ...

    """
    from climakitae.new_core.derived_variables import list_derived_variables

    msg = "Derived Variables (computed from source variables during loading):"
    logger.info(msg)
    logger.info("%s", "-" * len(msg))
    try:
        print(msg)
        print("%s" % ("-" * len(msg)))
    except Exception:
        pass

    try:
        derived_vars = list_derived_variables()
        if not derived_vars:
            no_vars_msg = "No derived variables registered"
            logger.info(no_vars_msg)
            try:
                print(no_vars_msg)
            except Exception:
                pass
        else:
            # Find max name length for alignment
            max_name_len = max(len(name) for name in derived_vars.keys())

            for name, info in sorted(derived_vars.items()):
                deps_str = ", ".join(info.depends_on)
                source_tag = f"[{info.source}]" if info.source == "user" else ""
                spacing = " " * (max_name_len - len(name) + 2)

                line = f"{name}{spacing}depends on: {deps_str} {source_tag}"
                if info.description:
                    line += f"\n{' ' * (max_name_len + 2)}  └─ {info.description}"

                logger.info(line)
                try:
                    print(line)
                except Exception:
                    pass

        logger.info("\n")
        try:
            print()
        except Exception:
            pass

    except Exception as e:
        logger.error("Error retrieving derived variables: %s", e, exc_info=True)

`show_processors(show_n=None)`

Display available data processors.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of processors to display. If None (default), shows all processors.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_processors(self, show_n: Optional[int] = None) -> None:
    """Display available data processors.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of processors to display. If None (default), shows all processors.
    """

    msg = "Processors (Methods for transforming raw catalog data):"
    logger.info(msg)
    logger.info("%s", "-" * len(msg))

    try:
        # Get current catalog from query
        current_catalog = self._query.get("catalog", UNSET)

        # Get valid processors (filtered by catalog if specified)
        if current_catalog is not UNSET:
            valid_processors = self._factory.get_valid_processors(current_catalog)
            logger.info("Showing processors valid for catalog: %s", current_catalog)
        else:
            # No catalog specified - show all processors from registry
            valid_processors = sorted(
                list(self._factory._processing_step_registry.keys())
            )
            logger.info("Showing all processors")

        total_count = len(valid_processors)
        limit = min(show_n, total_count) if show_n is not None else total_count
        display_processors = valid_processors[:limit]

        # Warn user of truncation if show_n was set
        if limit < total_count:
            truncation_msg = f"Showing {limit} of {total_count} total processors"
            logger.info("%s", truncation_msg)

        for processor in display_processors:
            logger.info("%s", processor)

        logger.info("\n")

    except Exception as e:
        logger.error("Error retrieving processors: %s", e, exc_info=True)

`show_station_options(show_n=None)`

Display available station options for data retrieval.

Parameters:

Name	Type	Description	Default
`show_n`	`int`	Maximum number of stations to display. If None (default), shows all stations.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_station_options(self, show_n: Optional[int] = None) -> None:
    """Display available station options for data retrieval.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of stations to display. If None (default), shows all stations.
    """
    msg = "Stations (Available weather stations for localization):"
    logger.info(msg)
    logger.info("%s", "-" * len(msg))
    try:
        stations = self._factory.get_stations()
        if not stations:
            logger.info("No stations available with current parameters")

        else:
            sorted_stations = sorted(stations)
            total_count = len(sorted_stations)
            limit = min(show_n, total_count) if show_n is not None else total_count
            display_stations = sorted_stations[:limit]

            # Warn user of truncation if show_n was set
            if limit < total_count:
                truncation_msg = f"Showing {limit} of {total_count} total stations"
                logger.info("%s", truncation_msg)

            for station in display_stations:
                logger.info("%s", station)

            logger.info("\n")
    except Exception as e:
        logger.error("Error retrieving stations: %s", e, exc_info=True)

`show_boundary_options(boundary_type=UNSET, show_n=None)`

Display available boundaries for spatial queries.

Parameters:

Name	Type	Description	Default
`boundary_type`	`str`	The type of boundary to display (e.g., "ca_counties", "ca_watersheds"). If not specified, displays available boundary types.	`UNSET`
`show_n`	`int`	Maximum number of boundaries to display. If None (default), shows all boundaries.	`None`

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_boundary_options(
    self, boundary_type=UNSET, show_n: Optional[int] = None
) -> None:
    """Display available boundaries for spatial queries.

    Parameters
    ----------
    boundary_type : str, optional
        The type of boundary to display (e.g., "ca_counties", "ca_watersheds").
        If not specified, displays available boundary types.
    show_n : int, optional
        Maximum number of boundaries to display. If None (default), shows all boundaries.

    """
    if boundary_type is UNSET:
        msg = "Boundary Types (call again with boundary_type='...' to see options):"
    else:
        msg = "Available {} Boundaries:".format(
            " ".join([x.capitalize() for x in boundary_type.split("_")])
        )
    logger.info(msg)
    logger.info("%s", "-" * len(msg))

    try:
        boundaries = self._factory.get_boundaries(boundary_type)
        if not boundaries:
            logger.info("No boundaries available with current parameters")

        else:
            sorted_boundaries = sorted(boundaries)
            total_count = len(sorted_boundaries)
            limit = min(show_n, total_count) if show_n is not None else total_count
            display_boundaries = sorted_boundaries[:limit]

            # Warn user of truncation if show_n was set
            if limit < total_count:
                truncation_msg = (
                    f"Showing {limit} of {total_count} total boundaries"
                )
                logger.info("%s", truncation_msg)

            for boundary in display_boundaries:
                logger.info("%s", boundary)

            logger.info("\n")
    except Exception as e:
        logger.error("Error retrieving boundaries: %s", e, exc_info=True)

`show_all_options()`

Display all available options for exploration.

Source code in climakitae/new_core/user_interface.py

@_with_info_verbosity
def show_all_options(self) -> None:
    """Display all available options for exploration."""
    data_title = "CAL ADAPT DATA -- ALL AVAILABLE OPTIONS USING CLIMAKITAE"
    logger.info("%s", "=" * len(data_title))
    logger.info(data_title)
    logger.info("%s", "=" * len(data_title))

    # Define truncation limits for show_all to keep output manageable
    truncation_limits = {
        "show_catalog_options": None,  # Small list, show all
        "show_activity_id_options": None,  # Small list, show all
        "show_institution_id_options": 10,
        "show_source_id_options": 10,
        "show_experiment_id_options": None,  # Small list, show all
        "show_table_id_options": None,  # Small list, show all
        "show_grid_label_options": None,  # Small list, show all
        "show_variable_options": 15,
        "show_installation_options": None,  # Small list, show all
        "show_station_id_options": 15,
        "show_network_id_options": None,  # Small list, show all
        "show_processors": 10,
        "show_station_options": 15,
    }

    option_methods = [
        ("show_catalog_options", "Catalogs"),
        ("show_activity_id_options", "Activity IDs"),
        ("show_institution_id_options", "Institution IDs"),
        ("show_source_id_options", "Source IDs"),
        ("show_experiment_id_options", "Experiment IDs"),
        ("show_table_id_options", "Table IDs (Temporal Resolution)"),
        ("show_grid_label_options", "Grid Labels (Spatial Resolution)"),
        ("show_variable_options", "Variables"),
        ("show_derived_variables", "Derived Variables"),
        ("show_installation_options", "Installations"),
        ("show_station_id_options", "Station IDs"),
        ("show_network_id_options", "Network IDs"),
        ("show_processors", "Processors"),
        ("show_station_options", "Stations"),
    ]

    for method_name, section_title in option_methods:
        try:
            limit = truncation_limits.get(method_name)
            method = getattr(self, method_name)
            if limit is not None:
                method(show_n=limit)
                # Let users know how to see all options
                hint_msg = (
                    f"Use {method_name}() to see all {section_title.lower()}."
                )
                logger.info("%s", hint_msg)
            else:
                method()
        except Exception as e:
            logger.error(
                "Error displaying %s: %s", section_title.lower(), e, exc_info=True
            )

    logger.info("%s", "\n" + "=" * 60)
    logger.info("Current Query Status:")
    logger.info("%s", "=" * 60)
    self.show_query()

`reset()`

Manually reset the query parameters.

Returns:

Type	Description
`ClimateData`	The current instance with reset parameters.

Source code in climakitae/new_core/user_interface.py

def reset(self) -> "ClimateData":
    """Manually reset the query parameters.

    Returns
    -------
    ClimateData
        The current instance with reset parameters.

    """
    return self._reset_query()

`copy_query()`

Get a copy of the current query parameters.

Returns:

Type	Description
`Dict[str, Any]`	A copy of the current query parameters.

Source code in climakitae/new_core/user_interface.py

def copy_query(self) -> Dict[str, Any]:
    """Get a copy of the current query parameters.

    Returns
    -------
    Dict[str, Any]
        A copy of the current query parameters.

    """
    return {k: v for k, v in self._query.items() if v is not UNSET}

`load_query(query_params)`

Load query parameters from a dictionary.

Uses the individual setter methods to ensure validation is applied to each parameter. Unknown keys are silently ignored.

Parameters:

Name	Type	Description	Default
`query_params`	`Dict[str, Any]`	Dictionary of query parameters to load. Supported keys: catalog, installation, activity_id, institution_id, source_id, experiment_id, table_id, grid_label, variable_id, processes.	required

Returns:

Type	Description
`ClimateData`	The current instance with loaded parameters.

Raises:

Type	Description
`ValueError`	If any parameter value fails validation.

Source code in climakitae/new_core/user_interface.py

def load_query(self, query_params: Dict[str, Any]) -> "ClimateData":
    """Load query parameters from a dictionary.

    Uses the individual setter methods to ensure validation is applied
    to each parameter. Unknown keys are silently ignored.

    Parameters
    ----------
    query_params : Dict[str, Any]
        Dictionary of query parameters to load. Supported keys:
        catalog, installation, activity_id, institution_id, source_id,
        experiment_id, table_id, grid_label, variable_id, processes.

    Returns
    -------
    ClimateData
        The current instance with loaded parameters.

    Raises
    ------
    ValueError
        If any parameter value fails validation.

    """
    # Map query keys to their setter methods
    setters = {
        "catalog": self.catalog,
        "installation": self.installation,
        "activity_id": self.activity_id,
        "institution_id": self.institution_id,
        "source_id": self.source_id,
        "experiment_id": self.experiment_id,
        "table_id": self.table_id,
        "grid_label": self.grid_label,
        "variable_id": self.variable,
        "processes": self.processes,
    }

    for key, value in query_params.items():
        if key in setters and value is not UNSET:
            setters[key](value)
    return self

ClimateData class

ClimateData

verbosity(level)

catalog(catalog)

installation(installation)

activity_id(activity_id)

institution_id(institution_id)

source_id(source_id)

experiment_id(experiment_id)

table_id(table_id)

grid_label(grid_label)

variable(variable)

derived_variable(name, depends_on, func, description='', units='', **query_extras)

station_id(station_id)

network_id(network_id)

processes(processes)

get()

show_query()

show_catalog_options(show_n=None)

show_installation_options(show_n=None)

show_activity_id_options(show_n=None)

show_institution_id_options(show_n=None)

show_source_id_options(show_n=None)

show_experiment_id_options(show_n=None)

show_station_id_options(show_n=None)

show_network_id_options(show_n=None)

show_table_id_options(show_n=None)

show_grid_label_options(show_n=None)

show_variable_options(show_n=None)

show_derived_variables()

show_processors(show_n=None)

show_station_options(show_n=None)

show_boundary_options(boundary_type=UNSET, show_n=None)

show_all_options()

reset()

copy_query()

load_query(query_params)

`ClimateData` class

`verbosity(level)`

`catalog(catalog)`

`installation(installation)`

`activity_id(activity_id)`

`institution_id(institution_id)`

`source_id(source_id)`

`experiment_id(experiment_id)`

`table_id(table_id)`

`grid_label(grid_label)`

`variable(variable)`

`derived_variable(name, depends_on, func, description='', units='', **query_extras)`

`station_id(station_id)`

`network_id(network_id)`

`processes(processes)`

`get()`

`show_query()`

`show_catalog_options(show_n=None)`

`show_installation_options(show_n=None)`

`show_activity_id_options(show_n=None)`

`show_institution_id_options(show_n=None)`

`show_source_id_options(show_n=None)`

`show_experiment_id_options(show_n=None)`

`show_station_id_options(show_n=None)`

`show_network_id_options(show_n=None)`

`show_table_id_options(show_n=None)`

`show_grid_label_options(show_n=None)`

`show_variable_options(show_n=None)`

`show_derived_variables()`

`show_processors(show_n=None)`

`show_station_options(show_n=None)`

`show_boundary_options(boundary_type=UNSET, show_n=None)`

`show_all_options()`

`reset()`

`copy_query()`

`load_query(query_params)`