Skip to content

Derived Variables Module

Registry and utilities for climate-derived variable computation.

Overview

The climakitae.new_core.derived_variables module provides: - Registry — Centralized registry of available derived variables - Utilities — Helper functions for variable transformation - Built-in derivations — Pre-configured derived variables: - Humidity indices (relative humidity, dew point) - Temperature indices (effective temperature, heat index) - Wind computations (wind speed, direction) - Climate indices (growing degree days, etc.)

Registry

Derived Variable Registry for ClimakitAE.

This module provides a singleton registry for derived climate variables that integrates with intake-esm's DerivedVariableRegistry. It enables users to define variables that are computed from other variables during data loading.

The registry follows the same patterns as the processor registry in climakitae.new_core.processors.abc_data_processor.

Classes:

Name Description
DerivedVariableInfo

Dataclass containing metadata about a registered derived variable.

Functions:

Name Description
get_registry

Get the global intake-esm DerivedVariableRegistry singleton.

register_derived

Decorator to register a derived variable function.

register_user_function

Imperatively register a user-defined derived variable.

list_derived_variables

List all registered derived variables with their metadata.

DerivedVariableInfo(name, depends_on, description, units, func, source='builtin', drop_dependencies=True) dataclass

Metadata about a registered derived variable.

Attributes:

Name Type Description
name str

The name of the derived variable (what users query).

depends_on list of str

Variable IDs that this derived variable requires.

description str

Human-readable description of what this variable represents.

units str

Expected units of the derived variable.

func callable

The function that computes the derived variable.

source str

Where this variable was registered from ('builtin' or 'user').

drop_dependencies bool

Whether to remove source variables after computing derived variable. Default: True (keep only the derived variable in the output).

preserve_spatial_metadata(ds, derived_var_name, source_var_name)

Copy spatial metadata from a source variable to a derived variable.

This function ensures that derived variables retain necessary spatial metadata (CRS, spatial_ref, coordinates) from their source variables. This is critical for downstream operations like clipping that require CRS information.

Parameters:

Name Type Description Default
ds Dataset

The dataset containing both source and derived variables.

required
derived_var_name str

Name of the derived variable to update.

required
source_var_name str

Name of the source variable to copy metadata from.

required
Notes

This function modifies the dataset in-place. It copies: - CRS via rioxarray if available - Lambert_Conformal or spatial_ref coordinate if present - grid_mapping attribute if present - Any coordinates present on the source but missing from derived

Examples:

>>> ds["derived"] = ds["source_a"] - ds["source_b"]
>>> preserve_spatial_metadata(ds, "derived", "source_a")
Source code in climakitae/new_core/derived_variables/registry.py
def preserve_spatial_metadata(ds, derived_var_name: str, source_var_name: str) -> None:
    """Copy spatial metadata from a source variable to a derived variable.

    This function ensures that derived variables retain necessary spatial
    metadata (CRS, spatial_ref, coordinates) from their source variables.
    This is critical for downstream operations like clipping that require
    CRS information.

    Parameters
    ----------
    ds : xr.Dataset
        The dataset containing both source and derived variables.
    derived_var_name : str
        Name of the derived variable to update.
    source_var_name : str
        Name of the source variable to copy metadata from.

    Notes
    -----
    This function modifies the dataset in-place. It copies:
    - CRS via rioxarray if available
    - Lambert_Conformal or spatial_ref coordinate if present
    - grid_mapping attribute if present
    - Any coordinates present on the source but missing from derived

    Examples
    --------
    >>> ds["derived"] = ds["source_a"] - ds["source_b"]
    >>> preserve_spatial_metadata(ds, "derived", "source_a")

    """
    if derived_var_name not in ds or source_var_name not in ds:
        logger.debug(
            "Cannot preserve metadata: '%s' or '%s' not in dataset",
            derived_var_name,
            source_var_name,
        )
        return

    source = ds[source_var_name]

    # Collect attributes to copy (copy grid_mapping first as rioxarray needs it)
    attrs_to_copy = {}
    if "grid_mapping" in source.attrs:
        attrs_to_copy["grid_mapping"] = source.attrs["grid_mapping"]

    # Copy Lambert_Conformal or other CRS coordinate if present
    # WRF data typically uses Lambert_Conformal as the CRS coordinate
    crs_coord_names = ["Lambert_Conformal", "spatial_ref", "crs"]
    coords_to_add = {}
    for crs_coord in crs_coord_names:
        if crs_coord in ds.coords and crs_coord not in ds[derived_var_name].coords:
            coords_to_add[crs_coord] = ds.coords[crs_coord]
            logger.debug(
                "Will copy '%s' coordinate to '%s'", crs_coord, derived_var_name
            )

    # Apply coordinate changes (this creates a new DataArray, losing attrs)
    if coords_to_add:
        ds[derived_var_name] = ds[derived_var_name].assign_coords(coords_to_add)

    # Now copy attributes AFTER coordinate assignment (since assign_coords makes new obj)
    if attrs_to_copy:
        ds[derived_var_name].attrs.update(attrs_to_copy)
        logger.debug(
            "Copied grid_mapping='%s' to '%s'",
            attrs_to_copy.get("grid_mapping"),
            derived_var_name,
        )

    # Try to copy CRS using rioxarray
    try:
        import rioxarray  # noqa: F401

        # Refresh reference after coord assignment
        derived = ds[derived_var_name]

        # Check if source has CRS via rioxarray
        if hasattr(source, "rio") and source.rio.crs is not None:
            ds[derived_var_name].rio.write_crs(source.rio.crs, inplace=True)
            # Re-apply attrs since rio.write_crs wipes them out!
            if attrs_to_copy:
                ds[derived_var_name].attrs.update(attrs_to_copy)
            logger.debug(
                "Copied CRS %s from '%s' to '%s'",
                source.rio.crs,
                source_var_name,
                derived_var_name,
            )
        elif hasattr(derived, "rio"):
            # Try to get CRS from grid_mapping attribute
            grid_mapping_name = derived.attrs.get("grid_mapping")
            if grid_mapping_name and grid_mapping_name in ds.coords:
                # rioxarray should be able to parse this now
                try:
                    crs = derived.rio.crs
                    if crs is not None:
                        logger.debug(
                            "Derived variable '%s' has CRS from grid_mapping: %s",
                            derived_var_name,
                            crs,
                        )
                except Exception:
                    pass
    except ImportError:
        logger.debug("rioxarray not available for CRS handling")
    except Exception as e:
        logger.debug("Could not copy CRS via rioxarray: %s", e)

get_registry()

Get the global derived variable registry singleton.

Returns:

Type Description
DerivedVariableRegistry

The singleton intake-esm DerivedVariableRegistry instance.

Notes

The registry is lazily initialized on first access. This ensures that builtin derived variables are registered before the registry is used.

Examples:

>>> registry = get_registry()
>>> print(registry)
DerivedVariableRegistry({...})
Source code in climakitae/new_core/derived_variables/registry.py
def get_registry() -> DerivedVariableRegistry:
    """Get the global derived variable registry singleton.

    Returns
    -------
    DerivedVariableRegistry
        The singleton intake-esm DerivedVariableRegistry instance.

    Notes
    -----
    The registry is lazily initialized on first access. This ensures that
    builtin derived variables are registered before the registry is used.

    Examples
    --------
    >>> registry = get_registry()
    >>> print(registry)
    DerivedVariableRegistry({...})

    """
    global _DERIVED_REGISTRY
    if _DERIVED_REGISTRY is None:
        logger.debug("Initializing DerivedVariableRegistry singleton")
        _DERIVED_REGISTRY = DerivedVariableRegistry()
    return _DERIVED_REGISTRY

register_derived(variable, query, description='', units='', source='builtin', drop_dependencies=True)

Decorator to register a derived variable function.

This decorator registers a function with the intake-esm DerivedVariableRegistry, enabling the variable to be queried directly from catalogs that have the registry attached.

Parameters:

Name Type Description Default
variable str

The name of the derived variable. This is what users will query.

required
query dict

Query constraints for finding source variables. Must include 'variable_id' with a list of required source variables. May include additional constraints like 'table_id', 'experiment_id', etc.

required
description str

Human-readable description of the derived variable.

''
units str

Expected units of the derived variable.

''
source str

Where this variable was registered from. Default is 'builtin'.

'builtin'
drop_dependencies bool

Whether to remove source variables from the output after computing the derived variable. Default: True (only return the derived variable). Set to False to keep source variables alongside the derived variable.

True

Returns:

Type Description
callable

The decorator function.

Examples:

>>> @register_derived(
...     variable='wind_speed',
...     query={'variable_id': ['u10', 'v10']},
...     description='Wind speed at 10m',
...     units='m/s'
... )
... def calc_wind_speed(ds):
...     import numpy as np
...     ds['wind_speed'] = np.sqrt(ds.u10**2 + ds.v10**2)
...     ds['wind_speed'].attrs = {'units': 'm/s', 'long_name': 'Wind Speed'}
...     return ds
>>> # Keep source variables alongside derived variable
>>> @register_derived(
...     variable='temp_range',
...     query={'variable_id': ['tasmax', 'tasmin']},
...     description='Daily temperature range',
...     drop_dependencies=False  # Keep tasmax and tasmin
... )
... def calc_temp_range(ds):
...     ds['temp_range'] = ds.tasmax - ds.tasmin
...     return ds
Notes

The decorated function must: - Accept a single xarray.Dataset argument - Add the derived variable to the dataset - Return the modified dataset - Set appropriate attributes (units, long_name) on the new variable

By default, source variables are dropped to reduce output size. Set drop_dependencies=False to keep them for downstream analysis.

Source code in climakitae/new_core/derived_variables/registry.py
def register_derived(
    variable: str,
    query: Dict[str, Any],
    description: str = "",
    units: str = "",
    source: str = "builtin",
    drop_dependencies: bool = True,
) -> Callable:
    """Decorator to register a derived variable function.

    This decorator registers a function with the intake-esm DerivedVariableRegistry,
    enabling the variable to be queried directly from catalogs that have the
    registry attached.

    Parameters
    ----------
    variable : str
        The name of the derived variable. This is what users will query.
    query : dict
        Query constraints for finding source variables. Must include 'variable_id'
        with a list of required source variables. May include additional constraints
        like 'table_id', 'experiment_id', etc.
    description : str, optional
        Human-readable description of the derived variable.
    units : str, optional
        Expected units of the derived variable.
    source : str, optional
        Where this variable was registered from. Default is 'builtin'.
    drop_dependencies : bool, optional
        Whether to remove source variables from the output after computing the
        derived variable. Default: True (only return the derived variable).
        Set to False to keep source variables alongside the derived variable.

    Returns
    -------
    callable
        The decorator function.

    Examples
    --------
    >>> @register_derived(
    ...     variable='wind_speed',
    ...     query={'variable_id': ['u10', 'v10']},
    ...     description='Wind speed at 10m',
    ...     units='m/s'
    ... )
    ... def calc_wind_speed(ds):
    ...     import numpy as np
    ...     ds['wind_speed'] = np.sqrt(ds.u10**2 + ds.v10**2)
    ...     ds['wind_speed'].attrs = {'units': 'm/s', 'long_name': 'Wind Speed'}
    ...     return ds

    >>> # Keep source variables alongside derived variable
    >>> @register_derived(
    ...     variable='temp_range',
    ...     query={'variable_id': ['tasmax', 'tasmin']},
    ...     description='Daily temperature range',
    ...     drop_dependencies=False  # Keep tasmax and tasmin
    ... )
    ... def calc_temp_range(ds):
    ...     ds['temp_range'] = ds.tasmax - ds.tasmin
    ...     return ds

    Notes
    -----
    The decorated function must:
    - Accept a single xarray.Dataset argument
    - Add the derived variable to the dataset
    - Return the modified dataset
    - Set appropriate attributes (units, long_name) on the new variable

    By default, source variables are dropped to reduce output size. Set
    drop_dependencies=False to keep them for downstream analysis.

    """

    def decorator(func: Callable) -> Callable:
        registry = get_registry()

        # Extract depends_on from query
        depends_on = query.get("variable_id", [])
        if isinstance(depends_on, str):
            depends_on = [depends_on]

        # Wrap function to automatically preserve spatial metadata
        # This must happen BEFORE storing in metadata so both intake-esm
        # and _apply_derived_variable use the same wrapped function
        wrapped_func = _wrap_with_metadata_preservation(
            func, variable, depends_on, drop_dependencies=drop_dependencies
        )

        # Store metadata with the WRAPPED function
        # This ensures _apply_derived_variable() also preserves spatial metadata
        _DERIVED_METADATA[variable] = DerivedVariableInfo(
            name=variable,
            depends_on=depends_on,
            description=description,
            units=units,
            func=wrapped_func,  # Use wrapped function, not original
            source=source,
            drop_dependencies=drop_dependencies,
        )

        # Register with intake-esm
        logger.info(
            "Registering derived variable '%s' depending on %s", variable, depends_on
        )
        registry.register(variable=variable, query=query)(wrapped_func)

        return func

    return decorator

register_user_function(name, depends_on, func, description='', units='', query_extras=None, drop_dependencies=True)

Register a user-defined derived variable at runtime.

This function allows users to register custom derived variables without using the decorator syntax. This is useful for dynamic registration or when working interactively.

Parameters:

Name Type Description Default
name str

The name of the derived variable. This is what users will query.

required
depends_on list of str

Variable IDs that this function requires (e.g., ['tasmax', 'tasmin']).

required
func callable

Function that takes an xarray.Dataset and returns a modified Dataset with the new variable added.

required
description str

Human-readable description of the derived variable.

''
units str

Expected units of the derived variable.

''
query_extras dict

Additional query constraints beyond variable_id (e.g., table_id, experiment_id).

None
drop_dependencies bool

Whether to remove source variables from the output after computing the derived variable. Default: True (only return the derived variable). Set to False to keep source variables alongside the derived variable.

True

Raises:

Type Description
ValueError

If name is empty or depends_on is empty.

TypeError

If func is not callable.

Examples:

>>> def calc_temp_range(ds):
...     ds['temp_range'] = ds.tasmax - ds.tasmin
...     ds['temp_range'].attrs = {'units': 'K', 'long_name': 'Diurnal Range'}
...     return ds
...
>>> register_user_function(
...     name='temp_range',
...     depends_on=['tasmax', 'tasmin'],
...     func=calc_temp_range,
...     description='Daily temperature range',
...     units='K'
... )
...
>>> # Now query it directly
>>> data = cd.catalog("cadcat").variable("temp_range").get()
Notes

Registration is permanent for the session. Once registered, the variable is available for all subsequent queries until the Python process ends.

Source code in climakitae/new_core/derived_variables/registry.py
def register_user_function(
    name: str,
    depends_on: List[str],
    func: Callable,
    description: str = "",
    units: str = "",
    query_extras: Optional[Dict[str, Any]] = None,
    drop_dependencies: bool = True,
) -> None:
    """Register a user-defined derived variable at runtime.

    This function allows users to register custom derived variables without
    using the decorator syntax. This is useful for dynamic registration or
    when working interactively.

    Parameters
    ----------
    name : str
        The name of the derived variable. This is what users will query.
    depends_on : list of str
        Variable IDs that this function requires (e.g., ['tasmax', 'tasmin']).
    func : callable
        Function that takes an xarray.Dataset and returns a modified Dataset
        with the new variable added.
    description : str, optional
        Human-readable description of the derived variable.
    units : str, optional
        Expected units of the derived variable.
    query_extras : dict, optional
        Additional query constraints beyond variable_id (e.g., table_id, experiment_id).
    drop_dependencies : bool, optional
        Whether to remove source variables from the output after computing the
        derived variable. Default: True (only return the derived variable).
        Set to False to keep source variables alongside the derived variable.

    Raises
    ------
    ValueError
        If name is empty or depends_on is empty.
    TypeError
        If func is not callable.

    Examples
    --------
    >>> def calc_temp_range(ds):
    ...     ds['temp_range'] = ds.tasmax - ds.tasmin
    ...     ds['temp_range'].attrs = {'units': 'K', 'long_name': 'Diurnal Range'}
    ...     return ds
    ...
    >>> register_user_function(
    ...     name='temp_range',
    ...     depends_on=['tasmax', 'tasmin'],
    ...     func=calc_temp_range,
    ...     description='Daily temperature range',
    ...     units='K'
    ... )
    ...
    >>> # Now query it directly
    >>> data = cd.catalog("cadcat").variable("temp_range").get()

    Notes
    -----
    Registration is permanent for the session. Once registered, the variable
    is available for all subsequent queries until the Python process ends.

    """
    # Validation
    if not name or not isinstance(name, str):
        raise ValueError("name must be a non-empty string")
    if not depends_on:
        raise ValueError("depends_on must be a non-empty list of variable IDs")
    if not callable(func):
        raise TypeError("func must be callable")

    # Build query
    query = {"variable_id": depends_on}
    if query_extras:
        query.update(query_extras)

    # Wrap function to automatically preserve spatial metadata
    # This must happen BEFORE storing in metadata
    wrapped_func = _wrap_with_metadata_preservation(
        func, name, depends_on, drop_dependencies=drop_dependencies
    )

    # Store metadata with the WRAPPED function
    _DERIVED_METADATA[name] = DerivedVariableInfo(
        name=name,
        depends_on=depends_on,
        description=description,
        units=units,
        func=wrapped_func,  # Use wrapped function, not original
        source="user",
        drop_dependencies=drop_dependencies,
    )

    # Register with intake-esm
    registry = get_registry()
    logger.info(
        "Registering user-defined derived variable '%s' depending on %s",
        name,
        depends_on,
    )
    registry.register(variable=name, query=query)(wrapped_func)

list_derived_variables()

List all registered derived variables with their metadata.

Returns:

Type Description
dict

Dictionary mapping variable names to DerivedVariableInfo objects.

Examples:

>>> derived_vars = list_derived_variables()
>>> for name, info in derived_vars.items():
...     print(f"{name}: depends on {info.depends_on}")
wind_speed: depends on ['u10', 'v10']
relative_humidity: depends on ['t2', 'q2', 'psfc']
Source code in climakitae/new_core/derived_variables/registry.py
def list_derived_variables() -> Dict[str, DerivedVariableInfo]:
    """List all registered derived variables with their metadata.

    Returns
    -------
    dict
        Dictionary mapping variable names to DerivedVariableInfo objects.

    Examples
    --------
    >>> derived_vars = list_derived_variables()
    >>> for name, info in derived_vars.items():
    ...     print(f"{name}: depends on {info.depends_on}")
    wind_speed: depends on ['u10', 'v10']
    relative_humidity: depends on ['t2', 'q2', 'psfc']

    """
    return _DERIVED_METADATA.copy()

is_derived_variable(variable)

Check if a variable name is a registered derived variable.

Parameters:

Name Type Description Default
variable str

The variable name to check.

required

Returns:

Type Description
bool

True if the variable is registered as a derived variable.

Source code in climakitae/new_core/derived_variables/registry.py
def is_derived_variable(variable: str) -> bool:
    """Check if a variable name is a registered derived variable.

    Parameters
    ----------
    variable : str
        The variable name to check.

    Returns
    -------
    bool
        True if the variable is registered as a derived variable.

    """
    return variable in _DERIVED_METADATA

get_derived_variable_info(variable)

Get metadata for a derived variable.

Parameters:

Name Type Description Default
variable str

The variable name to look up.

required

Returns:

Type Description
DerivedVariableInfo or None

Metadata for the variable, or None if not found.

Source code in climakitae/new_core/derived_variables/registry.py
def get_derived_variable_info(variable: str) -> Optional[DerivedVariableInfo]:
    """Get metadata for a derived variable.

    Parameters
    ----------
    variable : str
        The variable name to look up.

    Returns
    -------
    DerivedVariableInfo or None
        Metadata for the variable, or None if not found.

    """
    return _DERIVED_METADATA.get(variable)

Utilities

Helper utilities for derived variable computations.

Provides runtime helpers for resolving parameter precedence for derived variable calculations (explicit args -> dataset-level overrides -> global defaults).

get_derived_threshold(ds, derived_var_name=None, threshold_k=None, threshold_c=None, threshold_f=None)

Resolve a degree-day temperature based threshold in Kelvin.

Precedence: 1. Explicit function arguments (threshold_k, threshold_c, threshold_f) 2. Per-dataset attribute overrides (see dataset.attrs keys below) 3. Global default DEFAULT_DEGREE_DAY_THRESHOLD_K

Dataset attribute conventions supported (checked in order): - derived_variable_overrides : dict mapping derived var name -> params dict e.g. dataset.attrs['derived_variable_overrides'] = {'CDD_wrf': {'threshold_f': 75}} - Top-level attrs: threshold_k, threshold_c, threshold_f

Parameters:

Name Type Description Default
ds Dataset

Dataset that may contain attribute-based overrides.

required
derived_var_name str

Name of the derived variable to look up per-variable overrides.

None
threshold_k float

Threshold in Kelvin.

None
threshold_c float

Threshold in Celsius.

None
threshold_f float

Threshold in Fahrenheit.

None

Returns:

Type Description
float

Threshold value in Kelvin.

Source code in climakitae/new_core/derived_variables/utils.py
def get_derived_threshold(
    ds,
    derived_var_name: Optional[str] = None,
    threshold_k: Optional[float] = None,
    threshold_c: Optional[float] = None,
    threshold_f: Optional[float] = None,
) -> float:
    """Resolve a degree-day temperature based threshold in Kelvin.

    Precedence:
    1. Explicit function arguments (`threshold_k`, `threshold_c`, `threshold_f`)
    2. Per-dataset attribute overrides (see dataset.attrs keys below)
    3. Global default `DEFAULT_DEGREE_DAY_THRESHOLD_K`

    Dataset attribute conventions supported (checked in order):
    - `derived_variable_overrides` : dict mapping derived var name -> params dict
      e.g. dataset.attrs['derived_variable_overrides'] = {'CDD_wrf': {'threshold_f': 75}}
    - Top-level attrs: `threshold_k`, `threshold_c`, `threshold_f`

    Parameters
    ----------
    ds : xr.Dataset
        Dataset that may contain attribute-based overrides.
    derived_var_name : str, optional
        Name of the derived variable to look up per-variable overrides.
    threshold_k : float, optional
        Threshold in Kelvin.
    threshold_c : float, optional
        Threshold in Celsius.
    threshold_f : float, optional
        Threshold in Fahrenheit.

    Returns
    -------
    float
        Threshold value in Kelvin.
    """

    # 1) explicit args
    if threshold_k is not None:
        return float(threshold_k)
    if threshold_c is not None:
        return float(threshold_c) + 273.15
    if threshold_f is not None:
        return float(f_to_k(threshold_f))

    # 2) dataset-level overrides (attrs)
    attrs = getattr(ds, "attrs", None) or {}

    # per-derived-variable overrides
    if derived_var_name:
        overrides = attrs.get("derived_variable_overrides") or attrs.get(
            "derived_variable_params"
        )
        if isinstance(overrides, dict) and derived_var_name in overrides:
            params = overrides[derived_var_name] or {}
            if params.get("threshold_k") is not None:
                return float(params.get("threshold_k"))
            if params.get("threshold_c") is not None:
                return float(params.get("threshold_c")) + 273.15
            if params.get("threshold_f") is not None:
                return float(f_to_k(params.get("threshold_f")))

    # top-level attrs
    if attrs.get("threshold_k") is not None:
        return float(attrs.get("threshold_k"))
    if attrs.get("threshold_c") is not None:
        return float(attrs.get("threshold_c")) + 273.15
    if attrs.get("threshold_f") is not None:
        return float(f_to_k(attrs.get("threshold_f")))

    # 3) global default
    logger.debug(
        "No threshold override found for '%s'; using default %s K",
        derived_var_name,
        DEFAULT_DEGREE_DAY_THRESHOLD_K,
    )
    return float(DEFAULT_DEGREE_DAY_THRESHOLD_K)

Built-in Humidity Derivations

Humidity-related derived variables.

This module provides derived variables for humidity calculations including relative humidity, dew point temperature, and specific humidity.

Derived Variables

relative_humidity_2m Relative humidity at 2m from temperature, specific humidity, and pressure. dew_point_2m Dew point temperature at 2m from temperature and relative humidity.

calc_relative_humidity_2m(ds)

Calculate relative humidity at 2m.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2': 2m temperature (K) - 'q2': 2m specific humidity (kg/kg) - 'psfc': Surface pressure (Pa)

required

Returns:

Type Description
Dataset

Dataset with 'relative_humidity_2m' variable added (0-100 scale).

Notes

Uses the approximation: - Saturation vapor pressure: es = 611.2 * exp(17.67 * (T-273.15) / (T-29.65)) - Vapor pressure from specific humidity: e = q * p / (0.622 + 0.378*q) - Relative humidity: RH = 100 * e / es

Source code in climakitae/new_core/derived_variables/builtin/humidity.py
@register_derived(
    variable="relative_humidity_2m",
    query={"variable_id": ["t2", "q2", "psfc"]},
    description="Relative humidity at 2m computed from temperature, specific humidity, and surface pressure",
    units="%",
    source="builtin",
)
def calc_relative_humidity_2m(ds) -> "xr.Dataset":
    """Calculate relative humidity at 2m.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2': 2m temperature (K)
        - 'q2': 2m specific humidity (kg/kg)
        - 'psfc': Surface pressure (Pa)

    Returns
    -------
    xr.Dataset
        Dataset with 'relative_humidity_2m' variable added (0-100 scale).

    Notes
    -----
    Uses the approximation:
    - Saturation vapor pressure: es = 611.2 * exp(17.67 * (T-273.15) / (T-29.65))
    - Vapor pressure from specific humidity: e = q * p / (0.622 + 0.378*q)
    - Relative humidity: RH = 100 * e / es

    """
    logger.debug("Computing relative_humidity_2m from t2, q2, psfc")

    # Temperature in Celsius
    t_celsius = ds.t2 - 273.15

    # Saturation vapor pressure (Pa) using Tetens formula
    es = 611.2 * np.exp(17.67 * t_celsius / (t_celsius + 243.5))

    # Vapor pressure from specific humidity (Pa)
    # q = 0.622 * e / (p - 0.378 * e) => e = q * p / (0.622 + 0.378 * q)
    e = ds.q2 * ds.psfc / (0.622 + 0.378 * ds.q2)

    # Relative humidity (%)
    rh = 100.0 * e / es

    # Clip to valid range
    rh = rh.clip(0, 100)

    ds["relative_humidity_2m"] = rh
    ds["relative_humidity_2m"].attrs = {
        "units": "%",
        "long_name": "Relative Humidity at 2m",
        "standard_name": "relative_humidity",
        "valid_min": 0,
        "valid_max": 100,
        "derived_from": "t2, q2, psfc",
        "derived_by": "climakitae",
    }
    return ds

calc_dew_point_2m(ds)

Calculate dew point temperature at 2m.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2': 2m temperature (K) - 'rh': Relative humidity (0-100 scale)

required

Returns:

Type Description
Dataset

Dataset with 'dew_point_2m' variable added (K).

Notes

Uses the Magnus formula approximation for dew point.

Source code in climakitae/new_core/derived_variables/builtin/humidity.py
@register_derived(
    variable="dew_point_2m",
    query={"variable_id": ["t2", "rh"]},
    description="Dew point temperature at 2m from temperature and relative humidity",
    units="K",
    source="builtin",
)
def calc_dew_point_2m(ds) -> "xr.Dataset":
    """Calculate dew point temperature at 2m.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2': 2m temperature (K)
        - 'rh': Relative humidity (0-100 scale)

    Returns
    -------
    xr.Dataset
        Dataset with 'dew_point_2m' variable added (K).

    Notes
    -----
    Uses the Magnus formula approximation for dew point.

    """
    logger.debug("Computing dew_point_2m from t2 and rh")

    # Constants for Magnus formula
    a = 17.27
    b = 237.7  # °C

    # Temperature in Celsius
    t_celsius = ds.t2 - 273.15

    # Gamma function
    gamma = (a * t_celsius / (b + t_celsius)) + np.log(ds.rh / 100.0)

    # Dew point in Celsius
    tdp_celsius = b * gamma / (a - gamma)

    # Convert back to Kelvin
    ds["dew_point_2m"] = tdp_celsius + 273.15
    ds["dew_point_2m"].attrs = {
        "units": "K",
        "long_name": "Dew Point Temperature at 2m",
        "standard_name": "dew_point_temperature",
        "derived_from": "t2, rh",
        "derived_by": "climakitae",
    }
    return ds

calc_specific_humidity_2m(ds)

Calculate specific humidity at 2m.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2': 2m temperature (K) - 'rh': Relative humidity (0-100 scale) - 'psfc': Surface pressure (Pa)

required

Returns:

Type Description
Dataset

Dataset with 'specific_humidity_2m' variable added (kg/kg).

Source code in climakitae/new_core/derived_variables/builtin/humidity.py
@register_derived(
    variable="specific_humidity_2m",
    query={"variable_id": ["t2", "rh", "psfc"]},
    description="Specific humidity at 2m from temperature, relative humidity, and pressure",
    units="kg/kg",
    source="builtin",
)
def calc_specific_humidity_2m(ds) -> "xr.Dataset":
    """Calculate specific humidity at 2m.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2': 2m temperature (K)
        - 'rh': Relative humidity (0-100 scale)
        - 'psfc': Surface pressure (Pa)

    Returns
    -------
    xr.Dataset
        Dataset with 'specific_humidity_2m' variable added (kg/kg).

    """
    logger.debug("Computing specific_humidity_2m from t2, rh, psfc")

    # Temperature in Celsius
    t_celsius = ds.t2 - 273.15

    # Saturation vapor pressure (Pa)
    es = 611.2 * np.exp(17.67 * t_celsius / (t_celsius + 243.5))

    # Actual vapor pressure (Pa)
    e = (ds.rh / 100.0) * es

    # Specific humidity (kg/kg)
    # q = 0.622 * e / (p - 0.378 * e)
    q = 0.622 * e / (ds.psfc - 0.378 * e)

    ds["specific_humidity_2m"] = q
    ds["specific_humidity_2m"].attrs = {
        "units": "kg/kg",
        "long_name": "Specific Humidity at 2m",
        "standard_name": "specific_humidity",
        "derived_from": "t2, rh, psfc",
        "derived_by": "climakitae",
    }
    return ds

Built-in Temperature Derivations

Temperature-related derived variables.

This module provides derived variables for temperature calculations including heat index, wind chill, apparent temperature, and degree days.

Derived Variables

heat_index Heat index (feels-like temperature accounting for humidity). wind_chill Wind chill (feels-like temperature accounting for wind). apparent_temperature Apparent temperature combining temperature, humidity, and wind effects. diurnal_temperature_range Daily temperature range (tasmax - tasmin). HDD_wrf Heating degree days from WRF data (base 65°F). CDD_wrf Cooling degree days from WRF data (base 65°F). HDD_loca Heating degree days from LOCA2 data (base 65°F). CDD_loca Cooling degree days from LOCA2 data (base 65°F).

calc_heat_index(ds)

Calculate heat index from temperature and relative humidity.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2': 2m temperature (K) - 'rh': Relative humidity (0-100 scale)

required

Returns:

Type Description
Dataset

Dataset with 'heat_index' variable added (K).

Notes

Uses the NOAA/NWS heat index formula. The heat index is only meaningful when temperature is above ~27°C (80°F) and relative humidity is above ~40%. For lower values, the original temperature is returned.

Reference: https://www.weather.gov/media/ffc/ta_htindx.PDF

Source code in climakitae/new_core/derived_variables/builtin/temperature.py
@register_derived(
    variable="heat_index",
    query={"variable_id": ["t2", "rh"]},
    description="Heat index (feels-like temperature accounting for humidity effects)",
    units="K",
    source="builtin",
)
def calc_heat_index(ds) -> "xr.Dataset":
    """Calculate heat index from temperature and relative humidity.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2': 2m temperature (K)
        - 'rh': Relative humidity (0-100 scale)

    Returns
    -------
    xr.Dataset
        Dataset with 'heat_index' variable added (K).

    Notes
    -----
    Uses the NOAA/NWS heat index formula. The heat index is only meaningful
    when temperature is above ~27°C (80°F) and relative humidity is above ~40%.
    For lower values, the original temperature is returned.

    Reference: https://www.weather.gov/media/ffc/ta_htindx.PDF

    """
    logger.debug("Computing heat_index from t2 and rh")

    # Convert to Fahrenheit for the standard formula
    t_fahrenheit = (ds.t2 - 273.15) * 9 / 5 + 32
    rh = ds.rh

    # Simple approximation for low temperatures
    hi_simple = 0.5 * (t_fahrenheit + 61.0 + (t_fahrenheit - 68.0) * 1.2 + rh * 0.094)

    # Full Rothfusz regression for higher temperatures
    c1 = -42.379
    c2 = 2.04901523
    c3 = 10.14333127
    c4 = -0.22475541
    c5 = -6.83783e-3
    c6 = -5.481717e-2
    c7 = 1.22874e-3
    c8 = 8.5282e-4
    c9 = -1.99e-6

    hi_full = (
        c1
        + c2 * t_fahrenheit
        + c3 * rh
        + c4 * t_fahrenheit * rh
        + c5 * t_fahrenheit**2
        + c6 * rh**2
        + c7 * t_fahrenheit**2 * rh
        + c8 * t_fahrenheit * rh**2
        + c9 * t_fahrenheit**2 * rh**2
    )

    # Adjustments for edge cases
    # Low humidity adjustment
    low_rh_mask = (rh < 13) & (t_fahrenheit >= 80) & (t_fahrenheit <= 112)
    adjustment1 = ((13 - rh) / 4) * np.sqrt((17 - np.abs(t_fahrenheit - 95)) / 17)
    hi_full = xr.where(low_rh_mask, hi_full - adjustment1, hi_full)

    # High humidity adjustment
    high_rh_mask = (rh > 85) & (t_fahrenheit >= 80) & (t_fahrenheit <= 87)
    adjustment2 = ((rh - 85) / 10) * ((87 - t_fahrenheit) / 5)
    hi_full = xr.where(high_rh_mask, hi_full + adjustment2, hi_full)

    # Use simple formula for lower temps, full formula for higher
    heat_index_f = xr.where(hi_simple < 80, hi_simple, hi_full)

    # Use original temp if below heat index threshold
    heat_index_f = xr.where(t_fahrenheit < 80, t_fahrenheit, heat_index_f)

    # Convert back to Kelvin
    heat_index_k = (heat_index_f - 32) * 5 / 9 + 273.15

    ds["heat_index"] = heat_index_k
    ds["heat_index"].attrs = {
        "units": "K",
        "long_name": "Heat Index",
        "standard_name": "apparent_temperature",
        "comment": "Feels-like temperature accounting for humidity effects",
        "derived_from": "t2, rh",
        "derived_by": "climakitae",
    }
    return ds

calc_wind_chill(ds)

Calculate wind chill from temperature and wind components.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2': 2m temperature (K) - 'u10': 10m U wind component (m/s) - 'v10': 10m V wind component (m/s)

required

Returns:

Type Description
Dataset

Dataset with 'wind_chill' variable added (K).

Notes

Uses the NWS wind chill formula (2001 revision). Wind chill is only calculated when temperature is below 10°C (50°F) and wind speed is above 4.8 km/h (3 mph). Otherwise, the original temperature is returned.

Reference: https://www.weather.gov/media/epz/wxcalc/windChill.pdf

Source code in climakitae/new_core/derived_variables/builtin/temperature.py
@register_derived(
    variable="wind_chill",
    query={"variable_id": ["t2", "u10", "v10"]},
    description="Wind chill (feels-like temperature accounting for wind effects)",
    units="K",
    source="builtin",
)
def calc_wind_chill(ds) -> "xr.Dataset":
    """Calculate wind chill from temperature and wind components.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2': 2m temperature (K)
        - 'u10': 10m U wind component (m/s)
        - 'v10': 10m V wind component (m/s)

    Returns
    -------
    xr.Dataset
        Dataset with 'wind_chill' variable added (K).

    Notes
    -----
    Uses the NWS wind chill formula (2001 revision). Wind chill is only
    calculated when temperature is below 10°C (50°F) and wind speed is
    above 4.8 km/h (3 mph). Otherwise, the original temperature is returned.

    Reference: https://www.weather.gov/media/epz/wxcalc/windChill.pdf

    """
    logger.debug("Computing wind_chill from t2, u10, v10")

    # Wind speed in mph (formula uses mph)
    wind_speed_ms = np.sqrt(ds.u10**2 + ds.v10**2)
    wind_speed_mph = wind_speed_ms * 2.237  # m/s to mph

    # Temperature in Fahrenheit
    t_fahrenheit = (ds.t2 - 273.15) * 9 / 5 + 32

    # NWS Wind Chill formula
    wind_chill_f = (
        35.74
        + 0.6215 * t_fahrenheit
        - 35.75 * (wind_speed_mph**0.16)
        + 0.4275 * t_fahrenheit * (wind_speed_mph**0.16)
    )

    # Only apply when temp < 50°F and wind > 3 mph
    valid_mask = (t_fahrenheit <= 50) & (wind_speed_mph > 3)
    wind_chill_f = xr.where(valid_mask, wind_chill_f, t_fahrenheit)

    # Convert back to Kelvin
    wind_chill_k = (wind_chill_f - 32) * 5 / 9 + 273.15

    ds["wind_chill"] = wind_chill_k
    ds["wind_chill"].attrs = {
        "units": "K",
        "long_name": "Wind Chill Temperature",
        "standard_name": "apparent_temperature",
        "comment": "Feels-like temperature accounting for wind effects",
        "derived_from": "t2, u10, v10",
        "derived_by": "climakitae",
    }
    return ds

calc_diurnal_temperature_range_loca(ds)

Calculate diurnal (daily) temperature range.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 'tasmax': Daily maximum temperature (K) - 'tasmin': Daily minimum temperature (K)

required

Returns:

Type Description
Dataset

Dataset with 'diurnal_temperature_range_loca' variable added (K).

Source code in climakitae/new_core/derived_variables/builtin/temperature.py
@register_derived(
    variable="diurnal_temperature_range_loca",
    query={"variable_id": ["tasmax", "tasmin"]},
    description="Daily temperature range (maximum minus minimum)",
    units="K",
    source="builtin",
)
def calc_diurnal_temperature_range_loca(ds) -> "xr.Dataset":
    """Calculate diurnal (daily) temperature range.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 'tasmax': Daily maximum temperature (K)
        - 'tasmin': Daily minimum temperature (K)

    Returns
    -------
    xr.Dataset
        Dataset with 'diurnal_temperature_range_loca' variable added (K).

    """
    logger.debug("Computing diurnal_temperature_range_loca from tasmax and tasmin")

    ds["diurnal_temperature_range_loca"] = ds.tasmax - ds.tasmin
    ds["diurnal_temperature_range_loca"].attrs = {
        "units": "K",
        "long_name": "Diurnal Temperature Range",
        "comment": "Daily maximum minus daily minimum temperature",
        "derived_from": "tasmax, tasmin",
        "derived_by": "climakitae",
    }
    return ds

calc_diurnal_temperature_range_wrf(ds)

Calculate diurnal (daily) temperature range from WRF variables.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2max': Daily maximum 2m temperature (K) - 't2min': Daily minimum 2m temperature (K)

required

Returns:

Type Description
Dataset

Dataset with 'diurnal_temperature_range_wrf' variable added (K).

Source code in climakitae/new_core/derived_variables/builtin/temperature.py
@register_derived(
    variable="diurnal_temperature_range_wrf",
    query={"variable_id": ["t2max", "t2min"]},
    description="Daily temperature range from WRF data (maximum minus minimum)",
    units="K",
    source="builtin",
)
def calc_diurnal_temperature_range_wrf(ds) -> "xr.Dataset":
    """Calculate diurnal (daily) temperature range from WRF variables.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2max': Daily maximum 2m temperature (K)
        - 't2min': Daily minimum 2m temperature (K)

    Returns
    -------
    xr.Dataset
        Dataset with 'diurnal_temperature_range_wrf' variable added (K).

    """
    logger.debug("Computing diurnal_temperature_range_wrf from t2max and t2min")

    ds["diurnal_temperature_range_wrf"] = ds.t2max - ds.t2min
    ds["diurnal_temperature_range_wrf"].attrs = {
        "units": "K",
        "long_name": "Diurnal Temperature Range",
        "comment": "Daily maximum minus daily minimum 2m temperature",
        "derived_from": "t2max, t2min",
        "derived_by": "climakitae",
    }
    return ds

calc_hdd_wrf(ds, threshold_k=None, threshold_c=None, threshold_f=None)

Calculate heating degree days from WRF temperature data.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2max': Daily maximum 2m temperature (K) - 't2min': Daily minimum 2m temperature (K)

required
threshold_k float

Threshold for heating degree days in Kelvin.

None
threshold_c float

Threshold for heating degree days in Celsius.

None
threshold_f float

Threshold for heating degree days in Fahrenheit.

None

Returns:

Type Description
Dataset

Dataset with 'HDD_wrf' variable added (K).

Notes

Heating degree days (HDD) represent the energy demand for heating. Calculated as max(0, threshold - average_temp) where threshold is 65°F (291.48K) and average_temp is (t2max + t2min) / 2.

Source code in climakitae/new_core/derived_variables/builtin/temperature.py
@register_derived(
    variable="HDD_wrf",
    query={"variable_id": ["t2"]},
    description="Heating degree days from WRF data (base 65°F)",
    units="K",
    source="builtin",
)
def calc_hdd_wrf(
    ds, threshold_k=None, threshold_c=None, threshold_f=None
) -> "xr.Dataset":
    """Calculate heating degree days from WRF temperature data.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2max': Daily maximum 2m temperature (K)
        - 't2min': Daily minimum 2m temperature (K)
    threshold_k : float, optional
        Threshold for heating degree days in Kelvin.
    threshold_c : float, optional
        Threshold for heating degree days in Celsius.
    threshold_f : float, optional
        Threshold for heating degree days in Fahrenheit.

    Returns
    -------
    xr.Dataset
        Dataset with 'HDD_wrf' variable added (K).

    Notes
    -----
    Heating degree days (HDD) represent the energy demand for heating.
    Calculated as max(0, threshold - average_temp) where threshold is 65°F (291.48K)
    and average_temp is (t2max + t2min) / 2.

    """
    logger.debug("Computing HDD_wrf from t2max and t2min")

    # Calculate daily average temperature
    t_avg = ds.t2

    # Resolve threshold (explicit args > dataset attrs > global default)
    threshold_k = get_derived_threshold(
        ds,
        "HDD_wrf",
        threshold_k=threshold_k,
        threshold_c=threshold_c,
        threshold_f=threshold_f,
    )

    # HDD = max(0, threshold - avg_temp)
    ds["HDD_wrf"] = np.maximum(0, threshold_k - t_avg)
    ds["HDD_wrf"] = (ds["HDD_wrf"] > 0).astype(int)  # Binary mask data
    ds["HDD_wrf"].attrs = {
        "units": "K",
        "long_name": "Heating Degree Days (WRF)",
        "comment": f"Heating degree days calculated from daily average temperature with base {threshold_k} K",
        "derived_from": "t2",
        "derived_by": "climakitae",
        "threshold": f"{threshold_k} K",
    }
    return ds

calc_cdd_wrf(ds, threshold_k=None, threshold_c=None, threshold_f=None)

Calculate cooling degree days from WRF temperature data.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2max': Daily maximum 2m temperature (K) - 't2min': Daily minimum 2m temperature (K)

required
threshold_k float

Threshold for cooling degree days in Kelvin.

None
threshold_c float

Threshold for cooling degree days in Celsius.

None
threshold_f float

Threshold for cooling degree days in Fahrenheit.

None

Returns:

Type Description
Dataset

Dataset with 'CDD_wrf' variable added (K).

Notes

Cooling degree days (CDD) represent the energy demand for cooling. Calculated as max(0, average_temp - threshold) where threshold is 65°F (291.48K) and average_temp is (t2max + t2min) / 2.

Source code in climakitae/new_core/derived_variables/builtin/temperature.py
@register_derived(
    variable="CDD_wrf",
    query={"variable_id": ["t2"]},
    description="Cooling degree days from WRF data (base 65°F)",
    units="K",
    source="builtin",
)
def calc_cdd_wrf(
    ds, threshold_k=None, threshold_c=None, threshold_f=None
) -> "xr.Dataset":
    """Calculate cooling degree days from WRF temperature data.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2max': Daily maximum 2m temperature (K)
        - 't2min': Daily minimum 2m temperature (K)
    threshold_k : float, optional
        Threshold for cooling degree days in Kelvin.
    threshold_c : float, optional
        Threshold for cooling degree days in Celsius.
    threshold_f : float, optional
        Threshold for cooling degree days in Fahrenheit.

    Returns
    -------
    xr.Dataset
        Dataset with 'CDD_wrf' variable added (K).

    Notes
    -----
    Cooling degree days (CDD) represent the energy demand for cooling.
    Calculated as max(0, average_temp - threshold) where threshold is 65°F (291.48K)
    and average_temp is (t2max + t2min) / 2.

    """
    logger.debug("Computing CDD_wrf from t2max and t2min")

    # Calculate daily average temperature
    t_avg = ds.t2

    # Resolve threshold (explicit args > dataset attrs > global default)
    threshold_k = get_derived_threshold(
        ds,
        "CDD_wrf",
        threshold_k=threshold_k,
        threshold_c=threshold_c,
        threshold_f=threshold_f,
    )

    # CDD = max(0, avg_temp - threshold)
    ds["CDD_wrf"] = np.maximum(0, t_avg - threshold_k)
    ds["CDD_wrf"] = (ds["CDD_wrf"] > 0).astype(int)  # Binary mask data
    ds["CDD_wrf"].attrs = {
        "units": "K",
        "long_name": "Cooling Degree Days (WRF)",
        "comment": f"Cooling degree days calculated from daily average temperature with base {threshold_k} K",
        "derived_from": "t2",
        "derived_by": "climakitae",
        "threshold": f"{threshold_k} K",
    }
    return ds

calc_hdd_loca(ds, threshold_k=None, threshold_c=None, threshold_f=None)

Calculate heating degree days from LOCA2 temperature data.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 'tasmax': Daily maximum temperature (K) - 'tasmin': Daily minimum temperature (K)

required
threshold_k float

Threshold for heating degree days in Kelvin.

None
threshold_c float

Threshold for heating degree days in Celsius.

None
threshold_f float

Threshold for heating degree days in Fahrenheit.

None

Returns:

Type Description
Dataset

Dataset with 'HDD_loca' variable added (K).

Notes

Heating degree days (HDD) represent the energy demand for heating. Calculated as max(0, threshold - average_temp) where threshold is 65°F (291.48K) and average_temp is (tasmax + tasmin) / 2.

Source code in climakitae/new_core/derived_variables/builtin/temperature.py
@register_derived(
    variable="HDD_loca",
    query={"variable_id": ["tasmax", "tasmin"]},
    description="Heating degree days from LOCA2 data (base 65°F)",
    units="K",
    source="builtin",
)
def calc_hdd_loca(
    ds, threshold_k=None, threshold_c=None, threshold_f=None
) -> "xr.Dataset":
    """Calculate heating degree days from LOCA2 temperature data.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 'tasmax': Daily maximum temperature (K)
        - 'tasmin': Daily minimum temperature (K)
    threshold_k : float, optional
        Threshold for heating degree days in Kelvin.
    threshold_c : float, optional
        Threshold for heating degree days in Celsius.
    threshold_f : float, optional
        Threshold for heating degree days in Fahrenheit.

    Returns
    -------
    xr.Dataset
        Dataset with 'HDD_loca' variable added (K).

    Notes
    -----
    Heating degree days (HDD) represent the energy demand for heating.
    Calculated as max(0, threshold - average_temp) where threshold is 65°F (291.48K)
    and average_temp is (tasmax + tasmin) / 2.

    """
    logger.debug("Computing HDD_loca from tasmax and tasmin")

    # Calculate daily average temperature
    t_avg = (ds.tasmax + ds.tasmin) / 2

    # Resolve threshold (explicit args > dataset attrs > global default)
    threshold_k = get_derived_threshold(
        ds,
        "HDD_loca",
        threshold_k=threshold_k,
        threshold_c=threshold_c,
        threshold_f=threshold_f,
    )

    # HDD = max(0, threshold - avg_temp)
    ds["HDD_loca"] = np.maximum(0, threshold_k - t_avg)
    ds["HDD_loca"] = (ds["HDD_loca"] > 0).astype(int)  # Binary mask data
    ds["HDD_loca"].attrs = {
        "units": "K",
        "long_name": "Heating Degree Days (LOCA2)",
        "comment": f"Heating degree days calculated from daily average temperature with base {threshold_k} K",
        "derived_from": "tasmax, tasmin",
        "derived_by": "climakitae",
        "threshold": f"{threshold_k} K",
    }
    return ds

calc_cdd_loca(ds, threshold_k=None, threshold_c=None, threshold_f=None)

Calculate cooling degree days from LOCA2 temperature data.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 'tasmax': Daily maximum temperature (K) - 'tasmin': Daily minimum temperature (K)

required
threshold_k float

Threshold for cooling degree days in Kelvin.

None
threshold_c float

Threshold for cooling degree days in Celsius.

None
threshold_f float

Threshold for cooling degree days in Fahrenheit.

None

Returns:

Type Description
Dataset

Dataset with 'CDD_loca' variable added (K).

Notes

Cooling degree days (CDD) represent the energy demand for cooling. Calculated as max(0, average_temp - threshold) where threshold is 65°F (291.48K) and average_temp is (tasmax + tasmin) / 2.

Source code in climakitae/new_core/derived_variables/builtin/temperature.py
@register_derived(
    variable="CDD_loca",
    query={"variable_id": ["tasmax", "tasmin"]},
    description="Cooling degree days from LOCA2 data (base 65°F)",
    units="K",
    source="builtin",
)
def calc_cdd_loca(
    ds, threshold_k=None, threshold_c=None, threshold_f=None
) -> "xr.Dataset":
    """Calculate cooling degree days from LOCA2 temperature data.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 'tasmax': Daily maximum temperature (K)
        - 'tasmin': Daily minimum temperature (K)
    threshold_k : float, optional
        Threshold for cooling degree days in Kelvin.
    threshold_c : float, optional
        Threshold for cooling degree days in Celsius.
    threshold_f : float, optional
        Threshold for cooling degree days in Fahrenheit.

    Returns
    -------
    xr.Dataset
        Dataset with 'CDD_loca' variable added (K).

    Notes
    -----
    Cooling degree days (CDD) represent the energy demand for cooling.
    Calculated as max(0, average_temp - threshold) where threshold is 65°F (291.48K)
    and average_temp is (tasmax + tasmin) / 2.

    """
    logger.debug("Computing CDD_loca from tasmax and tasmin")

    # Calculate daily average temperature
    t_avg = (ds.tasmax + ds.tasmin) / 2

    # Resolve threshold (explicit args > dataset attrs > global default)
    threshold_k = get_derived_threshold(
        ds,
        "CDD_loca",
        threshold_k=threshold_k,
        threshold_c=threshold_c,
        threshold_f=threshold_f,
    )

    # CDD = max(0, avg_temp - threshold)
    ds["CDD_loca"] = np.maximum(0, t_avg - threshold_k)
    ds["CDD_loca"] = (ds["CDD_loca"] > 0).astype(int)  # Binary mask data
    ds["CDD_loca"].attrs = {
        "units": "K",
        "long_name": "Cooling Degree Days (LOCA2)",
        "comment": f"Cooling degree days calculated from daily average temperature with base {threshold_k} K",
        "derived_from": "tasmax, tasmin",
        "derived_by": "climakitae",
        "threshold": f"{threshold_k} K",
    }
    return ds

Built-in Wind Derivations

Wind-related derived variables.

This module provides derived variables for wind calculations including wind speed and wind direction from U and V wind components.

Derived Variables

wind_speed_10m Wind speed at 10m computed from u10 and v10 components. wind_direction_10m Wind direction at 10m computed from u10 and v10 components.

calc_wind_speed_10m(ds)

Calculate wind speed at 10m from U and V components.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing 'u10' and 'v10' variables.

required

Returns:

Type Description
Dataset

Dataset with 'wind_speed_10m' variable added.

Notes

Wind speed is computed as: sqrt(u10² + v10²)

Source code in climakitae/new_core/derived_variables/builtin/wind.py
@register_derived(
    variable="wind_speed_10m",
    query={"variable_id": ["u10", "v10"]},
    description="Wind speed at 10m height computed from U and V components",
    units="m/s",
    source="builtin",
)
def calc_wind_speed_10m(ds) -> "xr.Dataset":
    """Calculate wind speed at 10m from U and V components.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing 'u10' and 'v10' variables.

    Returns
    -------
    xr.Dataset
        Dataset with 'wind_speed_10m' variable added.

    Notes
    -----
    Wind speed is computed as: sqrt(u10² + v10²)

    """
    logger.debug("Computing wind_speed_10m from u10 and v10")
    ds["wind_speed_10m"] = np.sqrt(ds.u10**2 + ds.v10**2)
    ds["wind_speed_10m"].attrs = {
        "units": "m/s",
        "long_name": "Wind Speed at 10m",
        "standard_name": "wind_speed",
        "derived_from": "u10, v10",
        "derived_by": "climakitae",
    }
    return ds

calc_wind_direction_10m(ds)

Calculate wind direction at 10m from U and V components.

Parameters:

Name Type Description Default
ds Dataset

Dataset containing 'u10' and 'v10' variables.

required

Returns:

Type Description
Dataset

Dataset with 'wind_direction_10m' variable added.

Notes

Wind direction uses meteorological convention: the direction the wind is coming FROM, measured clockwise from north.

Direction = (270 - atan2(v10, u10) * 180/π) mod 360

Source code in climakitae/new_core/derived_variables/builtin/wind.py
@register_derived(
    variable="wind_direction_10m",
    query={"variable_id": ["u10", "v10"]},
    description="Wind direction at 10m height (meteorological convention: direction wind comes FROM)",
    units="degrees",
    source="builtin",
)
def calc_wind_direction_10m(ds) -> "xr.Dataset":
    """Calculate wind direction at 10m from U and V components.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing 'u10' and 'v10' variables.

    Returns
    -------
    xr.Dataset
        Dataset with 'wind_direction_10m' variable added.

    Notes
    -----
    Wind direction uses meteorological convention: the direction the wind
    is coming FROM, measured clockwise from north.

    Direction = (270 - atan2(v10, u10) * 180/π) mod 360

    """
    logger.debug("Computing wind_direction_10m from u10 and v10")
    # Meteorological wind direction: direction wind comes FROM
    wind_dir = (270 - np.arctan2(ds.v10, ds.u10) * 180 / np.pi) % 360
    ds["wind_direction_10m"] = wind_dir
    ds["wind_direction_10m"].attrs = {
        "units": "degrees",
        "long_name": "Wind Direction at 10m",
        "standard_name": "wind_from_direction",
        "comment": "Meteorological convention: direction wind is coming FROM, clockwise from north",
        "derived_from": "u10, v10",
        "derived_by": "climakitae",
    }
    return ds

Built-in Climate Indices

Fire weather and climate indices derived variables.

This module provides derived variables for fire weather and climate index calculations.

Derived Variables

fosberg_fire_weather_index Fosberg Fire Weather Index from temperature, humidity, and wind.

calc_fosberg_fire_weather_index(ds)

Calculate the Fosberg Fire Weather Index (FFWI).

Parameters:

Name Type Description Default
ds Dataset

Dataset containing: - 't2': 2m temperature (K) - 'q2': 2m specific humidity (kg/kg) - 'psfc': Surface pressure (Pa) - 'u10': U-component of wind at 10m (m/s) - 'v10': V-component of wind at 10m (m/s)

required

Returns:

Type Description
Dataset

Dataset with 'fosberg_fire_weather_index' variable added (0-100 scale).

References

https://a.atmos.washington.edu/wrfrt/descript/definitions/fosbergindex.html https://www.spc.noaa.gov/exper/firecomp/INFO/fosbinfo.html

Source code in climakitae/new_core/derived_variables/builtin/indices.py
@register_derived(
    variable="fosberg_fire_weather_index",
    query={"variable_id": ["t2", "q2", "psfc", "u10", "v10"]},
    description="Fosberg Fire Weather Index computed from temperature, humidity, and wind speed",
    units="[0 to 100]",
    source="builtin",
)
def calc_fosberg_fire_weather_index(ds) -> xr.Dataset:
    """Calculate the Fosberg Fire Weather Index (FFWI).

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing:
        - 't2': 2m temperature (K)
        - 'q2': 2m specific humidity (kg/kg)
        - 'psfc': Surface pressure (Pa)
        - 'u10': U-component of wind at 10m (m/s)
        - 'v10': V-component of wind at 10m (m/s)

    Returns
    -------
    xr.Dataset
        Dataset with 'fosberg_fire_weather_index' variable added (0-100 scale).

    References
    ----------
    https://a.atmos.washington.edu/wrfrt/descript/definitions/fosbergindex.html
    https://www.spc.noaa.gov/exper/firecomp/INFO/fosbinfo.html

    """
    logger.debug(
        "Computing fosberg fire weather index (FFWI) from t2, q2, psfc, u10, v10"
    )
    logger.debug(
        "Relative humidity and wind speed will be computed as intermediate steps"
    )

    ds = calc_relative_humidity_2m(ds)
    ds = calc_wind_speed_10m(ds)

    # Convert units
    t2_F = (ds.t2 - 273.15) * 9 / 5 + 32  # K -> Fahrenheit
    rh = ds["relative_humidity_2m"]  # already 0-100
    wind_mph = ds["wind_speed_10m"] * 2.23694  # m/s -> mph

    # Equilibrium moisture content
    m_low, m_mid, m_high = _equilibrium_moisture_constant(h=rh, T=t2_F)
    m = xr.where(rh < 10, m_low, m_mid)
    m = xr.where(rh > 50, m_high, m)

    # Moisture dampening coefficient
    n = _moisture_dampening_coeff(m)

    # FFWI, clipped to [0, 100]
    ffwi = (n * ((1 + wind_mph**2) ** 0.5) / 0.3002).clip(0, 100)

    # Restore coordinate attributes lost in xr.where
    for coord in list(ffwi.coords):
        if coord in ds.coords:
            ffwi[coord].attrs = ds[coord].attrs

    ds["fosberg_fire_weather_index"] = ffwi
    ds["fosberg_fire_weather_index"].attrs = {
        "units": "[0 to 100]",
        "long_name": "Fosberg Fire Weather Index",
        "derived_from": "t2, q2, psfc, u10, v10",
        "derived_by": "climakitae",
    }

    # Drop intermediate derived variables
    ds = ds.drop_vars(["relative_humidity_2m", "wind_speed_10m"])
    return ds