Skip to content

ClimateData class

The ClimateData class is the main entry point for the ClimateData interface. It exposes the fluent / builder API used to assemble climate-data queries.

API reference for climakitae.new_core.user_interface.

ClimateData

A fluent interface for accessing climate data.

This class provides a chainable interface for setting parameters and retrieving climate data. It uses a factory pattern to create datasets and validators based on the specified parameters. The class is designed to be chainable, allowing users to set multiple parameters in a single expression.

The interface supports various climate data sources and allows for flexible querying with different combinations of parameters. All methods return the instance itself to enable method chaining.

Other Parameters:

Name Type Description
catalog str

The data catalog to use (e.g., "renewable energy generation", "cadcat").

installation str

The installation type (e.g., "pv_utility", "wind_offshore").

activity_id str

The activity identifier (e.g., "WRF", "LOCA2").

institution_id str

The institution identifier (e.g., "CNRM", "DWD").

source_id str

The source identifier (e.g., "GCM", "RCM", "Station").

experiment_id str or list of str

The experiment identifier (e.g., "historical", "ssp245").

table_id str

The temporal resolution (e.g., "1hr", "day", "mon").

grid_label str

The spatial resolution (e.g., "d01", "d02", "d03").

variable_id str

The climate variable (e.g., "tasmax", "pr", "cf").

processes dict

Dictionary of data processing operations to apply.

Methods:

Name Description
verbosity

Set the logging verbosity level.

catalog

Set the data catalog to use.

installation

Set the installation type.

activity_id

Set the activity identifier.

institution_id

Set the institution identifier.

source_id

Set the source identifier.

experiment_id

Set the experiment identifier(s).

table_id

Set the temporal resolution.

grid_label

Set the spatial resolution.

variable

Set the climate variable to retrieve.

station_id

Set the station identifier

network_id

Set the network identifier

processes

Set processing operations to apply to the data.

get

Execute the query and retrieve the climate data.

show_query

Display the current query configuration.

show_catalog_options

Display available catalog options.

show_installation_options

Display available installation options.

show_activity_id_options

Display available activity ID options.

show_institution_id_options

Display available institution ID options.

show_source_id_options

Display available source ID options.

show_experiment_id_options

Display available experiment ID options.

show_table_id_options

Display available table ID (temporal resolution) options.

show_grid_label_options

Display available grid label (spatial resolution) options.

show_variable_options

Display available climate variable options.

show_station_id_options

Display available station ID options.

show_network_id_options

Display available network ID options.

show_derived_variables

Display registered derived variables.

show_processors

Display registered data processors.

show_station_options

Display available weather station options.

show_boundary_options

Display available boundary options; pass a boundary type to list sub-options.

show_all_options

Display all available options for exploration.

Returns:

Type Description
DataArray or None

The retrieved climate data as a lazy-loaded xarray DataArray, or None if the query fails or required parameters are missing.

Raises:

Type Description
ValueError

If required parameters are missing or invalid during validation.

Exception

If there is an error during data retrieval or processing.

Examples:

Basic usage with method chaining:

>>> cd = ClimateData()
>>> data = (cd
...     .catalog("cadcat")
...     .activity_id("WRF")
...     .experiment_id("historical")
...     .table_id("1hr")
...     .grid_label("d02")
...     .variable("prec")
...     .get()
...    )

Exploring available options:

>>> cd = ClimateData()
>>> cd.show_catalog_options()
>>> cd.catalog("cadcat").show_variable_options()

Using with processing:

>>> processes = {"spatial_avg": "region", "temporal_avg": "monthly"}
>>> data = (ClimateData()
...         .catalog("climate")
...         .variable("pr")
...         .processes(processes)
...         .get())

Initialize the ClimateData interface.

Sets up the factory for dataset creation and initializes query parameters to their default (UNSET) state. Optionally configures logging to file or stdout.

Parameters:

Name Type Description Default
log_file str

Path to log file. If None, logs to stdout. Default is None.

None
verbosity int

Logging verbosity level: - <= -2: Effectively silent (no logs) - -1: WARNING level - 0: INFO level (default) - > 0: DEBUG level Default is 0.

0
Source code in climakitae/new_core/user_interface.py
def __init__(self, log_file: Optional[str] = None, verbosity: int = 0):
    """Initialize the ClimateData interface.

    Sets up the factory for dataset creation and initializes
    query parameters to their default (UNSET) state. Optionally
    configures logging to file or stdout.

    Parameters
    ----------
    log_file : str, optional
        Path to log file. If None, logs to stdout. Default is None.
    verbosity : int, optional
        Logging verbosity level:
        - <= -2: Effectively silent (no logs)
        - -1: WARNING level
        - 0: INFO level (default)
        - > 0: DEBUG level
        Default is 0.

    """
    # Configure logging
    self._log_file = log_file
    self._verbosity = verbosity
    self._configure_logging()

    try:
        logger.info("Initializing ClimateData interface")
        self._factory = DatasetFactory()
        self._reset_query()
        self.var_desc = read_csv_file(VARIABLE_DESCRIPTIONS_CSV_PATH)
        logger.info("ClimateData initialization successful")
        logger.info("✅ Ready to query!")
    except Exception as e:
        logger.error("❌ Setup failed: %s", str(e), exc_info=True)
        return

verbosity(level)

Set the logging verbosity level.

This method allows dynamic adjustment of logging verbosity and supports method chaining.

Parameters:

Name Type Description Default
level int

Logging verbosity mapping: - <= -2: effectively silent (no logs) - -1: WARNING level - 0: INFO level (default) - >0: DEBUG level (user must specify >0 to get debug)

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Examples:

>>> cd = ClimateData()
>>> cd.verbosity(-1)  # warnings only
>>> cd.verbosity(0)   # info (default)
>>> cd.verbosity(1)   # debug
Source code in climakitae/new_core/user_interface.py
def verbosity(self, level: int) -> "ClimateData":
    """Set the logging verbosity level.

    This method allows dynamic adjustment of logging verbosity
    and supports method chaining.

    Parameters
    ----------
    level : int
        Logging verbosity mapping:
        - <= -2: effectively silent (no logs)
        - -1: WARNING level
        - 0: INFO level (default)
        - >0: DEBUG level (user must specify >0 to get debug)

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    Examples
    --------
    >>> cd = ClimateData()
    >>> cd.verbosity(-1)  # warnings only
    >>> cd.verbosity(0)   # info (default)
    >>> cd.verbosity(1)   # debug

    """
    if not isinstance(level, int):
        raise ValueError("Verbosity level must be an integer")

    logger.debug("Setting verbosity level to %d", level)
    self._verbosity = level
    self._configure_logging()
    logger.info("Verbosity level set to %d", level)
    return self

catalog(catalog)

Set the data catalog to use for the query.

Parameters:

Name Type Description Default
catalog str

The name of the catalog (e.g., "renewables", "climate").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def catalog(self, catalog: str) -> "ClimateData":
    """Set the data catalog to use for the query.

    Parameters
    ----------
    catalog : str
        The name of the catalog (e.g., "renewables", "climate").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting catalog to: %s", catalog)
    if not isinstance(catalog, str) or not catalog.strip():
        logger.error("Invalid catalog parameter: must be non-empty string")
        raise ValueError("Catalog must be a non-empty string")
    self._query["catalog"] = catalog.strip()
    logger.info("Catalog set to: %s", catalog.strip())
    return self

installation(installation)

Set the installation type for the query.

Parameters:

Name Type Description Default
installation str

The installation type (e.g., "pv_utility", "wind_offshore").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def installation(self, installation: str) -> "ClimateData":
    """Set the installation type for the query.

    Parameters
    ----------
    installation : str
        The installation type (e.g., "pv_utility", "wind_offshore").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting installation to: %s", installation)
    if not isinstance(installation, str) or not installation.strip():
        logger.error("Invalid installation parameter: must be non-empty string")
        raise ValueError("Installation must be a non-empty string")
    self._query["installation"] = installation.strip()
    logger.info("Installation set to: %s", installation.strip())
    return self

activity_id(activity_id)

Set the activity identifier for the query.

Parameters:

Name Type Description Default
activity_id str

The activity ID (e.g., "CMIP6", "CORDEX").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def activity_id(self, activity_id: str) -> "ClimateData":
    """Set the activity identifier for the query.

    Parameters
    ----------
    activity_id : str
        The activity ID (e.g., "CMIP6", "CORDEX").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting activity_id to: %s", activity_id)
    if not isinstance(activity_id, str) or not activity_id.strip():
        logger.error("Invalid activity_id parameter: must be non-empty string")
        raise ValueError("Activity ID must be a non-empty string")
    self._query["activity_id"] = activity_id.strip()
    logger.info("Activity ID set to: %s", activity_id.strip())
    return self

institution_id(institution_id)

Set the institution identifier for the query.

Parameters:

Name Type Description Default
institution_id str

The institution ID (e.g., "CNRM", "DWD").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def institution_id(self, institution_id: str) -> "ClimateData":
    """Set the institution identifier for the query.

    Parameters
    ----------
    institution_id : str
        The institution ID (e.g., "CNRM", "DWD").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting institution_id to: %s", institution_id)
    if not isinstance(institution_id, str) or not institution_id.strip():
        logger.error("Invalid institution_id parameter: must be non-empty string")
        raise ValueError("Institution ID must be a non-empty string")
    self._query["institution_id"] = institution_id.strip()
    logger.info("Institution ID set to: %s", institution_id.strip())
    return self

source_id(source_id)

Set the source identifier for the query.

Parameters:

Name Type Description Default
source_id str

The source ID (e.g., "GCM", "RCM", "Station").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def source_id(self, source_id: str) -> "ClimateData":
    """Set the source identifier for the query.

    Parameters
    ----------
    source_id : str
        The source ID (e.g., "GCM", "RCM", "Station").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting source_id to: %s", source_id)
    if not isinstance(source_id, str) or not source_id.strip():
        logger.error("Invalid source_id parameter: must be non-empty string")
        raise ValueError("Source ID must be a non-empty string")
    self._query["source_id"] = source_id.strip()
    logger.info("Source ID set to: %s", source_id.strip())
    return self

experiment_id(experiment_id)

Set the experiment identifier for the query.

Parameters:

Name Type Description Default
experiment_id str or list of str

The experiment ID (e.g., "historical", "ssp245") or a list of experiment IDs to query multiple scenarios at once.

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Examples:

>>> cd.experiment_id("ssp245")  # Single experiment
>>> cd.experiment_id(["historical", "ssp245", "ssp370"])  # Multiple
Source code in climakitae/new_core/user_interface.py
def experiment_id(self, experiment_id: str | list[str]) -> "ClimateData":
    """Set the experiment identifier for the query.

    Parameters
    ----------
    experiment_id : str or list of str
        The experiment ID (e.g., "historical", "ssp245") or a list of
        experiment IDs to query multiple scenarios at once.

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    Examples
    --------
    >>> cd.experiment_id("ssp245")  # Single experiment
    >>> cd.experiment_id(["historical", "ssp245", "ssp370"])  # Multiple

    """
    logger.debug("Setting experiment_id to: %s", experiment_id)
    exp = []
    if not isinstance(experiment_id, (str, list)):
        logger.error(
            "Invalid experiment_id parameter: must be string or list of strings"
        )
        raise ValueError(
            "Experiment ID must be a non-empty string or list of strings"
        )
    if isinstance(experiment_id, str):
        if not experiment_id.strip():
            logger.error("Invalid experiment_id parameter: empty string")
            raise ValueError("Experiment ID must be a non-empty string")
        exp.append(experiment_id.strip())
    else:
        for exp_id in experiment_id:
            if not isinstance(exp_id, str) or not exp_id.strip():
                logger.error(
                    "Invalid experiment_id in list: must be non-empty strings"
                )
                raise ValueError("Each experiment ID must be a non-empty string")
            exp.append(exp_id.strip())
    self._query["experiment_id"] = exp
    logger.info("Experiment ID(s) set to: %s", exp)
    return self

table_id(table_id)

Set the temporal resolution identifier for the query.

Parameters:

Name Type Description Default
table_id str

The temporal resolution (e.g., "1hr", "day", "mon").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def table_id(self, table_id: str) -> "ClimateData":
    """Set the temporal resolution identifier for the query.

    Parameters
    ----------
    table_id : str
        The temporal resolution (e.g., "1hr", "day", "mon").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting table_id to: %s", table_id)
    if not isinstance(table_id, str) or not table_id.strip():
        logger.error("Invalid table_id parameter: must be non-empty string")
        raise ValueError("Table ID must be a non-empty string")
    self._query["table_id"] = table_id.strip()
    logger.info("Table ID set to: %s", table_id.strip())
    return self

grid_label(grid_label)

Set the spatial resolution identifier for the query.

Parameters:

Name Type Description Default
grid_label str

The spatial resolution (e.g., "d01", "d02", "d03").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def grid_label(self, grid_label: str) -> "ClimateData":
    """Set the spatial resolution identifier for the query.

    Parameters
    ----------
    grid_label : str
        The spatial resolution (e.g., "d01", "d02", "d03").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting grid_label to: %s", grid_label)
    if not isinstance(grid_label, str) or not grid_label.strip():
        logger.error("Invalid grid_label parameter: must be non-empty string")
        raise ValueError("Grid label must be a non-empty string")
    self._query["grid_label"] = grid_label.strip()
    logger.info("Grid label set to: %s", grid_label.strip())
    return self

variable(variable)

Set the climate variable to retrieve.

Parameters:

Name Type Description Default
variable str

The variable identifier (e.g., "tasmax", "pr", "cf"). Can also be a registered derived variable name.

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def variable(self, variable: str) -> "ClimateData":
    """Set the climate variable to retrieve.

    Parameters
    ----------
    variable : str
        The variable identifier (e.g., "tasmax", "pr", "cf").
        Can also be a registered derived variable name.

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting variable to: %s", variable)
    if not isinstance(variable, str) or not variable.strip():
        logger.error("Invalid variable parameter: must be non-empty string")
        raise ValueError("Variable must be a non-empty string")
    self._query["variable_id"] = variable.strip()
    logger.info("Variable set to: %s", variable.strip())
    return self

derived_variable(name, depends_on, func, description='', units='', **query_extras)

Register and query a user-defined derived variable.

This method registers a custom function that computes a new variable from existing source variables, then sets that variable as the query target. The computation happens during data loading (not as a post-processor).

Parameters:

Name Type Description Default
name str

Name for the new derived variable. This becomes queryable like any other variable in the catalog.

required
depends_on list of str

List of source variable IDs required for the computation (e.g., ['tasmax', 'tasmin'] or ['t2', 'rh']).

required
func callable

Function that takes an xarray.Dataset and returns a modified Dataset with the new variable added. The function signature should be: func(ds: xr.Dataset) -> xr.Dataset

required
description str

Human-readable description of what this variable represents.

''
units str

Expected units of the derived variable.

''
**query_extras

Additional query constraints (e.g., table_id='day').

{}

Returns:

Type Description
ClimateData

The current instance for method chaining.

Examples:

Define and query a custom temperature range variable:

>>> def calc_temp_range(ds):
...     ds['temp_range'] = ds.tasmax - ds.tasmin
...     ds['temp_range'].attrs = {'units': 'K', 'long_name': 'Daily Range'}
...     return ds
...
>>> data = (cd
...     .catalog("cadcat")
...     .activity_id("LOCA2")
...     .table_id("day")
...     .grid_label("d03")
...     .derived_variable(
...         name='temp_range',
...         depends_on=['tasmax', 'tasmin'],
...         func=calc_temp_range,
...         description='Daily temperature range',
...         units='K'
...     )
...     .get())
Notes
  • Registration is permanent for the Python session
  • The function must add the variable to the dataset and return it
  • Set appropriate attributes (units, long_name) on the new variable
  • For complex post-load transformations, use processors instead
See Also

show_derived_variables : View all registered derived variables climakitae.new_core.derived_variables : Module documentation

Source code in climakitae/new_core/user_interface.py
def derived_variable(
    self,
    name: str,
    depends_on: list[str],
    func: Any,
    description: str = "",
    units: str = "",
    **query_extras,
) -> "ClimateData":
    """Register and query a user-defined derived variable.

    This method registers a custom function that computes a new variable
    from existing source variables, then sets that variable as the query target.
    The computation happens during data loading (not as a post-processor).

    Parameters
    ----------
    name : str
        Name for the new derived variable. This becomes queryable like any
        other variable in the catalog.
    depends_on : list of str
        List of source variable IDs required for the computation
        (e.g., ['tasmax', 'tasmin'] or ['t2', 'rh']).
    func : callable
        Function that takes an xarray.Dataset and returns a modified Dataset
        with the new variable added. The function signature should be:
        ``func(ds: xr.Dataset) -> xr.Dataset``
    description : str, optional
        Human-readable description of what this variable represents.
    units : str, optional
        Expected units of the derived variable.
    **query_extras
        Additional query constraints (e.g., table_id='day').

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    Examples
    --------
    Define and query a custom temperature range variable:

    >>> def calc_temp_range(ds):
    ...     ds['temp_range'] = ds.tasmax - ds.tasmin
    ...     ds['temp_range'].attrs = {'units': 'K', 'long_name': 'Daily Range'}
    ...     return ds
    ...
    >>> data = (cd
    ...     .catalog("cadcat")
    ...     .activity_id("LOCA2")
    ...     .table_id("day")
    ...     .grid_label("d03")
    ...     .derived_variable(
    ...         name='temp_range',
    ...         depends_on=['tasmax', 'tasmin'],
    ...         func=calc_temp_range,
    ...         description='Daily temperature range',
    ...         units='K'
    ...     )
    ...     .get())

    Notes
    -----
    - Registration is permanent for the Python session
    - The function must add the variable to the dataset and return it
    - Set appropriate attributes (units, long_name) on the new variable
    - For complex post-load transformations, use processors instead

    See Also
    --------
    show_derived_variables : View all registered derived variables
    climakitae.new_core.derived_variables : Module documentation

    """
    from climakitae.new_core.derived_variables import register_user_function
    from climakitae.new_core.param_validation.derived_variable_param_validator import (
        validate_derived_variable_params,
    )

    logger.debug(
        "Registering derived variable '%s' depending on %s", name, depends_on
    )

    # Validate parameters
    if not validate_derived_variable_params(
        name, depends_on, func, query_extras or None
    ):
        logger.warning("Derived variable validation failed, continuing anyway")

    # Register the function
    register_user_function(
        name=name,
        depends_on=depends_on,
        func=func,
        description=description,
        units=units,
        query_extras=query_extras or None,
    )

    # Set as the query variable
    self._query["variable_id"] = name
    logger.info("Derived variable '%s' registered and set as query target", name)

    return self

station_id(station_id)

Set the station identifier for the query.

Parameters:

Name Type Description Default
station_id str

The station ID (e.g., "ASOSAWOS_72019300117", "ASOSAWOS_72020200118").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def station_id(self, station_id: str | list[str]) -> "ClimateData":
    """Set the station identifier for the query.

    Parameters
    ----------
    station_id : str
        The station ID (e.g., "ASOSAWOS_72019300117", "ASOSAWOS_72020200118").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting station_id to: %s", station_id)
    stn = []
    if not isinstance(station_id, (str, list)):
        logger.error(
            "Invalid station_id parameter: must be string or list of strings"
        )
        raise ValueError("Station ID must be a non-empty string or list of strings")
    if isinstance(station_id, str):
        if not station_id.strip():
            logger.error("Invalid station_id parameter: empty string")
            raise ValueError("Station ID must be a non-empty string")
        stn.append(station_id.strip())
    else:
        for id in station_id:
            if not isinstance(id, str) or not id.strip():
                logger.error(
                    "Invalid station_id in list: must be non-empty strings"
                )
                raise ValueError("Each station ID must be a non-empty string")
            stn.append(id.strip())
    self._query["station_id"] = stn
    logger.info("Station ID(s) set to: %s", stn)
    return self

network_id(network_id)

Set the network identifier for the query.

Parameters:

Name Type Description Default
network_id str | list[str]

The network ID (e.g., "ASOSAWOS", "CWOP").

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def network_id(self, network_id: str | list[str]) -> "ClimateData":
    """Set the network identifier for the query.

    Parameters
    ----------
    network_id : str | list[str]
        The network ID (e.g., "ASOSAWOS", "CWOP").

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting network_id to: %s", network_id)
    if not isinstance(network_id, (str, list)):
        logger.error(
            "Invalid network_id parameter: must be string or list of strings"
        )
        raise ValueError("Network ID must be a non-empty string or list of strings")
    if isinstance(network_id, str):
        if not network_id.strip():
            logger.error("Invalid network_id parameter: empty string")
            raise ValueError("Network ID must be a non-empty string")
        self._query["network_id"] = network_id.strip()
        logger.info("Network ID set to: %s", network_id.strip())
    else:
        net = []
        for id in network_id:
            if not isinstance(id, str) or not id.strip():
                logger.error(
                    "Invalid network_id in list: must be non-empty strings"
                )
                raise ValueError("Each network ID must be a non-empty string")
            net.append(id.strip())
        self._query["network_id"] = net
        logger.info("Network ID(s) set to: %s", net)
    return self

processes(processes)

Set processing operations to apply to the retrieved data.

Parameters:

Name Type Description Default
processes Dict[str, Union[str, Iterable]]

A dictionary of processing operations and their parameters.

required

Returns:

Type Description
ClimateData

The current instance for method chaining.

Source code in climakitae/new_core/user_interface.py
def processes(self, processes: Dict[str, Union[str, Iterable]]) -> "ClimateData":
    """Set processing operations to apply to the retrieved data.

    Parameters
    ----------
    processes : Dict[str, Union[str, Iterable]]
        A dictionary of processing operations and their parameters.

    Returns
    -------
    ClimateData
        The current instance for method chaining.

    """
    logger.debug("Setting processes to: %s", processes)
    if not isinstance(processes, dict):
        logger.error("Invalid processes parameter: must be a dictionary")
        raise ValueError("Processes must be a dictionary")
    self._query["processes"] = processes.copy()
    logger.info("Processes set: %d operations configured", len(processes))
    return self

get()

Execute the configured query and retrieve climate data.

Validates required parameters, creates the appropriate dataset using the factory pattern, executes the query, and resets the query state for the next use.

Thread Safety

This method takes a snapshot of the query at the start of execution, making it safe to call from multiple threads on the same ClimateData instance. However, for maximum clarity and safety, it is recommended to use separate ClimateData instances in multi-threaded scenarios.

Returns:

Type Description
Optional[DataArray]

The retrieved climate data as a lazy-loaded xarray DataArray, or None if the query fails or validation errors occur.

Raises:

Type Description
ValueError

If required parameters are missing during validation.

Exception

If there are errors during dataset creation or execution.

Source code in climakitae/new_core/user_interface.py
def get(self) -> Optional[Any]:
    """Execute the configured query and retrieve climate data.

    Validates required parameters, creates the appropriate dataset using
    the factory pattern, executes the query, and resets the query state
    for the next use.

    Thread Safety
    -------------
    This method takes a snapshot of the query at the start of execution,
    making it safe to call from multiple threads on the same ClimateData
    instance. However, for maximum clarity and safety, it is recommended
    to use separate ClimateData instances in multi-threaded scenarios.

    Returns
    -------
    Optional[xr.DataArray]
        The retrieved climate data as a lazy-loaded xarray DataArray,
        or None if the query fails or validation errors occur.

    Raises
    ------
    ValueError
        If required parameters are missing during validation.
    Exception
        If there are errors during dataset creation or execution.

    """
    logger.info("Starting data retrieval with query: %s", self._query)
    data = None

    # Take a snapshot of the query for thread-safety
    # This allows concurrent calls to get() without corrupting each other
    query_snapshot = copy.deepcopy(self._query)

    # Validate required parameters using the snapshot for thread-safety
    logger.debug("Validating required parameters")
    if not self._validate_required_parameters(query_snapshot):
        logger.warning("Required parameter validation failed")
        self._reset_query()
        return None

    try:
        # Create dataset using factory with the snapshot
        logger.debug("Creating dataset using factory")
        dataset = self._factory.create_dataset(query_snapshot)
        logger.info("Dataset created successfully")
    except (ValueError, KeyError, TypeError) as e:
        logger.error("Error during dataset creation: %s", str(e))
        logger.debug("Traceback:", exc_info=True)
        self._reset_query()
        return None

    try:
        # Execute the query with the snapshot
        logger.debug("Executing query")
        data = dataset.execute(query_snapshot)
        # check if empty dataset
        # Check if data is empty/null
        if (
            data is None
            or (hasattr(data, "nbytes") and data.nbytes == 0)
            or (isinstance(data, dict) and not data)
        ):
            logger.warning("⚠️ Warning: Retrieved dataset is empty.")

        else:
            logger.info("✅ Data retrieval successful!")

    except (ValueError, KeyError, IOError, RuntimeError) as e:
        logger.error("❌ Data retrieval failed: %s", str(e))
        logger.debug("Traceback:", exc_info=True)

    # Always reset query after execution
    self._reset_query()
    return data

show_query()

Display the current query configuration.

Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_query(self) -> None:
    """Display the current query configuration."""
    msg = "Current Query:"
    logger.info(msg)
    logger.info("%s", "-" * len(msg))
    for key, value in self._query.items():
        display_value = value if value is not UNSET else "UNSET"
        logger.info("%s: %s", key, display_value)

show_catalog_options(show_n=None)

Display available catalog options.

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_catalog_options(self, show_n: Optional[int] = None) -> None:
    """Display available catalog options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "catalog",
        "catalog options (Cloud data collections)",
        limit_per_group=show_n,
    )

show_installation_options(show_n=None)

Display available installation options.

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_installation_options(self, show_n: Optional[int] = None) -> None:
    """Display available installation options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "installation",
        "installation options (Renewable energy generation types)",
        limit_per_group=show_n,
    )

show_activity_id_options(show_n=None)

Display available activity ID options.

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_activity_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available activity ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "activity_id",
        "activity_id options (Downscaling methods)",
        limit_per_group=show_n,
    )

show_institution_id_options(show_n=None)

Display available institution ID options.

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_institution_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available institution ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "institution_id",
        "institution_id options (Data producers)",
        limit_per_group=show_n,
    )

show_source_id_options(show_n=None)

Display available source ID options.

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_source_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available source ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "source_id",
        "source_id options (Climate model simulations)",
        limit_per_group=show_n,
    )

show_experiment_id_options(show_n=None)

Display available experiment ID options.

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_experiment_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available experiment ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "experiment_id",
        "experiment_id options (Simulation runs)",
        limit_per_group=show_n,
    )

show_station_id_options(show_n=None)

Display available station ID options.

Parameters:

Name Type Description Default
show_n int

Maximum number of stations to display. If None (default), shows all stations.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_station_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available station ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of stations to display. If None (default), shows all stations.
    """
    self._show_options(
        "station_id",
        "station_id options (Weather station names)",
        limit_per_group=show_n,
    )

show_network_id_options(show_n=None)

Display available network ID options.

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_network_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available network ID options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "network_id",
        "network_id options (Weather network names)",
        limit_per_group=show_n,
    )

show_table_id_options(show_n=None)

Display available table ID options (Temporal resolutions).

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_table_id_options(self, show_n: Optional[int] = None) -> None:
    """Display available table ID options (Temporal resolutions).

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "table_id",
        "table_id options (Temporal resolutions)",
        limit_per_group=show_n,
    )

show_grid_label_options(show_n=None)

Display available grid label options (Spatial resolutions).

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_grid_label_options(self, show_n: Optional[int] = None) -> None:
    """Display available grid label options (Spatial resolutions).

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    self._show_options(
        "grid_label",
        "grid_label options (Spatial resolutions)",
        limit_per_group=show_n,
    )

show_variable_options(show_n=None)

Display available variable options.

Parameters:

Name Type Description Default
show_n int

Maximum number of options to display. If None (default), shows all options.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_variable_options(self, show_n: Optional[int] = None) -> None:
    """Display available variable options.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of options to display. If None (default), shows all options.
    """
    current_query = {k: v for k, v in self._query.items() if v is not UNSET}
    msg = ""
    if current_query:
        msg = "Variables (constrained by current query):"
    else:
        msg = "Variables"

    self._show_options("variable_id", msg, limit_per_group=show_n)

show_derived_variables()

Display all registered derived variables.

Shows both builtin and user-registered derived variables with their dependencies and descriptions.

Examples:

>>> cd = ClimateData()
>>> cd.show_derived_variables()
Derived Variables (computed from source variables during loading):
------------------------------------------------------------------
wind_speed_10m      depends on: u10, v10
heat_index          depends on: t2, rh
...
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_derived_variables(self) -> None:
    """Display all registered derived variables.

    Shows both builtin and user-registered derived variables with their
    dependencies and descriptions.

    Examples
    --------
    >>> cd = ClimateData()
    >>> cd.show_derived_variables()
    Derived Variables (computed from source variables during loading):
    ------------------------------------------------------------------
    wind_speed_10m      depends on: u10, v10
    heat_index          depends on: t2, rh
    ...

    """
    from climakitae.new_core.derived_variables import list_derived_variables

    msg = "Derived Variables (computed from source variables during loading):"
    logger.info(msg)
    logger.info("%s", "-" * len(msg))
    try:
        print(msg)
        print("%s" % ("-" * len(msg)))
    except Exception:
        pass

    try:
        derived_vars = list_derived_variables()
        if not derived_vars:
            no_vars_msg = "No derived variables registered"
            logger.info(no_vars_msg)
            try:
                print(no_vars_msg)
            except Exception:
                pass
        else:
            # Find max name length for alignment
            max_name_len = max(len(name) for name in derived_vars.keys())

            for name, info in sorted(derived_vars.items()):
                deps_str = ", ".join(info.depends_on)
                source_tag = f"[{info.source}]" if info.source == "user" else ""
                spacing = " " * (max_name_len - len(name) + 2)

                line = f"{name}{spacing}depends on: {deps_str} {source_tag}"
                if info.description:
                    line += f"\n{' ' * (max_name_len + 2)}  └─ {info.description}"

                logger.info(line)
                try:
                    print(line)
                except Exception:
                    pass

        logger.info("\n")
        try:
            print()
        except Exception:
            pass

    except Exception as e:
        logger.error("Error retrieving derived variables: %s", e, exc_info=True)

show_processors(show_n=None)

Display available data processors.

Parameters:

Name Type Description Default
show_n int

Maximum number of processors to display. If None (default), shows all processors.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_processors(self, show_n: Optional[int] = None) -> None:
    """Display available data processors.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of processors to display. If None (default), shows all processors.
    """

    msg = "Processors (Methods for transforming raw catalog data):"
    logger.info(msg)
    logger.info("%s", "-" * len(msg))

    try:
        # Get current catalog from query
        current_catalog = self._query.get("catalog", UNSET)

        # Get valid processors (filtered by catalog if specified)
        if current_catalog is not UNSET:
            valid_processors = self._factory.get_valid_processors(current_catalog)
            logger.info("Showing processors valid for catalog: %s", current_catalog)
        else:
            # No catalog specified - show all processors from registry
            valid_processors = sorted(
                list(self._factory._processing_step_registry.keys())
            )
            logger.info("Showing all processors")

        total_count = len(valid_processors)
        limit = min(show_n, total_count) if show_n is not None else total_count
        display_processors = valid_processors[:limit]

        # Warn user of truncation if show_n was set
        if limit < total_count:
            truncation_msg = f"Showing {limit} of {total_count} total processors"
            logger.info("%s", truncation_msg)

        for processor in display_processors:
            logger.info("%s", processor)

        logger.info("\n")

    except Exception as e:
        logger.error("Error retrieving processors: %s", e, exc_info=True)

show_station_options(show_n=None)

Display available station options for data retrieval.

Parameters:

Name Type Description Default
show_n int

Maximum number of stations to display. If None (default), shows all stations.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_station_options(self, show_n: Optional[int] = None) -> None:
    """Display available station options for data retrieval.

    Parameters
    ----------
    show_n : int, optional
        Maximum number of stations to display. If None (default), shows all stations.
    """
    msg = "Stations (Available weather stations for localization):"
    logger.info(msg)
    logger.info("%s", "-" * len(msg))
    try:
        stations = self._factory.get_stations()
        if not stations:
            logger.info("No stations available with current parameters")

        else:
            sorted_stations = sorted(stations)
            total_count = len(sorted_stations)
            limit = min(show_n, total_count) if show_n is not None else total_count
            display_stations = sorted_stations[:limit]

            # Warn user of truncation if show_n was set
            if limit < total_count:
                truncation_msg = f"Showing {limit} of {total_count} total stations"
                logger.info("%s", truncation_msg)

            for station in display_stations:
                logger.info("%s", station)

            logger.info("\n")
    except Exception as e:
        logger.error("Error retrieving stations: %s", e, exc_info=True)

show_boundary_options(boundary_type=UNSET, show_n=None)

Display available boundaries for spatial queries.

Parameters:

Name Type Description Default
boundary_type str

The type of boundary to display (e.g., "ca_counties", "ca_watersheds"). If not specified, displays available boundary types.

UNSET
show_n int

Maximum number of boundaries to display. If None (default), shows all boundaries.

None
Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_boundary_options(
    self, boundary_type=UNSET, show_n: Optional[int] = None
) -> None:
    """Display available boundaries for spatial queries.

    Parameters
    ----------
    boundary_type : str, optional
        The type of boundary to display (e.g., "ca_counties", "ca_watersheds").
        If not specified, displays available boundary types.
    show_n : int, optional
        Maximum number of boundaries to display. If None (default), shows all boundaries.

    """
    if boundary_type is UNSET:
        msg = "Boundary Types (call again with boundary_type='...' to see options):"
    else:
        msg = "Available {} Boundaries:".format(
            " ".join([x.capitalize() for x in boundary_type.split("_")])
        )
    logger.info(msg)
    logger.info("%s", "-" * len(msg))

    try:
        boundaries = self._factory.get_boundaries(boundary_type)
        if not boundaries:
            logger.info("No boundaries available with current parameters")

        else:
            sorted_boundaries = sorted(boundaries)
            total_count = len(sorted_boundaries)
            limit = min(show_n, total_count) if show_n is not None else total_count
            display_boundaries = sorted_boundaries[:limit]

            # Warn user of truncation if show_n was set
            if limit < total_count:
                truncation_msg = (
                    f"Showing {limit} of {total_count} total boundaries"
                )
                logger.info("%s", truncation_msg)

            for boundary in display_boundaries:
                logger.info("%s", boundary)

            logger.info("\n")
    except Exception as e:
        logger.error("Error retrieving boundaries: %s", e, exc_info=True)

show_all_options()

Display all available options for exploration.

Source code in climakitae/new_core/user_interface.py
@_with_info_verbosity
def show_all_options(self) -> None:
    """Display all available options for exploration."""
    data_title = "CAL ADAPT DATA -- ALL AVAILABLE OPTIONS USING CLIMAKITAE"
    logger.info("%s", "=" * len(data_title))
    logger.info(data_title)
    logger.info("%s", "=" * len(data_title))

    # Define truncation limits for show_all to keep output manageable
    truncation_limits = {
        "show_catalog_options": None,  # Small list, show all
        "show_activity_id_options": None,  # Small list, show all
        "show_institution_id_options": 10,
        "show_source_id_options": 10,
        "show_experiment_id_options": None,  # Small list, show all
        "show_table_id_options": None,  # Small list, show all
        "show_grid_label_options": None,  # Small list, show all
        "show_variable_options": 15,
        "show_installation_options": None,  # Small list, show all
        "show_station_id_options": 15,
        "show_network_id_options": None,  # Small list, show all
        "show_processors": 10,
        "show_station_options": 15,
    }

    option_methods = [
        ("show_catalog_options", "Catalogs"),
        ("show_activity_id_options", "Activity IDs"),
        ("show_institution_id_options", "Institution IDs"),
        ("show_source_id_options", "Source IDs"),
        ("show_experiment_id_options", "Experiment IDs"),
        ("show_table_id_options", "Table IDs (Temporal Resolution)"),
        ("show_grid_label_options", "Grid Labels (Spatial Resolution)"),
        ("show_variable_options", "Variables"),
        ("show_derived_variables", "Derived Variables"),
        ("show_installation_options", "Installations"),
        ("show_station_id_options", "Station IDs"),
        ("show_network_id_options", "Network IDs"),
        ("show_processors", "Processors"),
        ("show_station_options", "Stations"),
    ]

    for method_name, section_title in option_methods:
        try:
            limit = truncation_limits.get(method_name)
            method = getattr(self, method_name)
            if limit is not None:
                method(show_n=limit)
                # Let users know how to see all options
                hint_msg = (
                    f"Use {method_name}() to see all {section_title.lower()}."
                )
                logger.info("%s", hint_msg)
            else:
                method()
        except Exception as e:
            logger.error(
                "Error displaying %s: %s", section_title.lower(), e, exc_info=True
            )

    logger.info("%s", "\n" + "=" * 60)
    logger.info("Current Query Status:")
    logger.info("%s", "=" * 60)
    self.show_query()

reset()

Manually reset the query parameters.

Returns:

Type Description
ClimateData

The current instance with reset parameters.

Source code in climakitae/new_core/user_interface.py
def reset(self) -> "ClimateData":
    """Manually reset the query parameters.

    Returns
    -------
    ClimateData
        The current instance with reset parameters.

    """
    return self._reset_query()

copy_query()

Get a copy of the current query parameters.

Returns:

Type Description
Dict[str, Any]

A copy of the current query parameters.

Source code in climakitae/new_core/user_interface.py
def copy_query(self) -> Dict[str, Any]:
    """Get a copy of the current query parameters.

    Returns
    -------
    Dict[str, Any]
        A copy of the current query parameters.

    """
    return {k: v for k, v in self._query.items() if v is not UNSET}

load_query(query_params)

Load query parameters from a dictionary.

Uses the individual setter methods to ensure validation is applied to each parameter. Unknown keys are silently ignored.

Parameters:

Name Type Description Default
query_params Dict[str, Any]

Dictionary of query parameters to load. Supported keys: catalog, installation, activity_id, institution_id, source_id, experiment_id, table_id, grid_label, variable_id, processes.

required

Returns:

Type Description
ClimateData

The current instance with loaded parameters.

Raises:

Type Description
ValueError

If any parameter value fails validation.

Source code in climakitae/new_core/user_interface.py
def load_query(self, query_params: Dict[str, Any]) -> "ClimateData":
    """Load query parameters from a dictionary.

    Uses the individual setter methods to ensure validation is applied
    to each parameter. Unknown keys are silently ignored.

    Parameters
    ----------
    query_params : Dict[str, Any]
        Dictionary of query parameters to load. Supported keys:
        catalog, installation, activity_id, institution_id, source_id,
        experiment_id, table_id, grid_label, variable_id, processes.

    Returns
    -------
    ClimateData
        The current instance with loaded parameters.

    Raises
    ------
    ValueError
        If any parameter value fails validation.

    """
    # Map query keys to their setter methods
    setters = {
        "catalog": self.catalog,
        "installation": self.installation,
        "activity_id": self.activity_id,
        "institution_id": self.institution_id,
        "source_id": self.source_id,
        "experiment_id": self.experiment_id,
        "table_id": self.table_id,
        "grid_label": self.grid_label,
        "variable_id": self.variable,
        "processes": self.processes,
    }

    for key, value in query_params.items():
        if key in setters and value is not UNSET:
            setters[key](value)
    return self