Derived Variables Module
Registry and utilities for climate-derived variable computation.
Overview
The climakitae.new_core.derived_variables module provides:
- Registry — Centralized registry of available derived variables
- Utilities — Helper functions for variable transformation
- Built-in derivations — Pre-configured derived variables:
- Humidity indices (relative humidity, dew point)
- Temperature indices (effective temperature, heat index)
- Wind computations (wind speed, direction)
- Climate indices (growing degree days, etc.)
Registry
Derived Variable Registry for ClimakitAE.
This module provides a singleton registry for derived climate variables that integrates with intake-esm's DerivedVariableRegistry. It enables users to define variables that are computed from other variables during data loading.
The registry follows the same patterns as the processor registry in climakitae.new_core.processors.abc_data_processor.
Classes:
| Name | Description |
|---|---|
DerivedVariableInfo |
Dataclass containing metadata about a registered derived variable. |
Functions:
| Name | Description |
|---|---|
get_registry |
Get the global intake-esm DerivedVariableRegistry singleton. |
register_derived |
Decorator to register a derived variable function. |
register_user_function |
Imperatively register a user-defined derived variable. |
list_derived_variables |
List all registered derived variables with their metadata. |
DerivedVariableInfo(name, depends_on, description, units, func, source='builtin', drop_dependencies=True)
dataclass
Metadata about a registered derived variable.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the derived variable (what users query). |
depends_on |
list of str
|
Variable IDs that this derived variable requires. |
description |
str
|
Human-readable description of what this variable represents. |
units |
str
|
Expected units of the derived variable. |
func |
callable
|
The function that computes the derived variable. |
source |
str
|
Where this variable was registered from ('builtin' or 'user'). |
drop_dependencies |
bool
|
Whether to remove source variables after computing derived variable. Default: True (keep only the derived variable in the output). |
preserve_spatial_metadata(ds, derived_var_name, source_var_name)
Copy spatial metadata from a source variable to a derived variable.
This function ensures that derived variables retain necessary spatial metadata (CRS, spatial_ref, coordinates) from their source variables. This is critical for downstream operations like clipping that require CRS information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
The dataset containing both source and derived variables. |
required |
derived_var_name
|
str
|
Name of the derived variable to update. |
required |
source_var_name
|
str
|
Name of the source variable to copy metadata from. |
required |
Notes
This function modifies the dataset in-place. It copies: - CRS via rioxarray if available - Lambert_Conformal or spatial_ref coordinate if present - grid_mapping attribute if present - Any coordinates present on the source but missing from derived
Examples:
>>> ds["derived"] = ds["source_a"] - ds["source_b"]
>>> preserve_spatial_metadata(ds, "derived", "source_a")
Source code in climakitae/new_core/derived_variables/registry.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 | |
get_registry()
Get the global derived variable registry singleton.
Returns:
| Type | Description |
|---|---|
DerivedVariableRegistry
|
The singleton intake-esm DerivedVariableRegistry instance. |
Notes
The registry is lazily initialized on first access. This ensures that builtin derived variables are registered before the registry is used.
Examples:
Source code in climakitae/new_core/derived_variables/registry.py
register_derived(variable, query, description='', units='', source='builtin', drop_dependencies=True)
Decorator to register a derived variable function.
This decorator registers a function with the intake-esm DerivedVariableRegistry, enabling the variable to be queried directly from catalogs that have the registry attached.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variable
|
str
|
The name of the derived variable. This is what users will query. |
required |
query
|
dict
|
Query constraints for finding source variables. Must include 'variable_id' with a list of required source variables. May include additional constraints like 'table_id', 'experiment_id', etc. |
required |
description
|
str
|
Human-readable description of the derived variable. |
''
|
units
|
str
|
Expected units of the derived variable. |
''
|
source
|
str
|
Where this variable was registered from. Default is 'builtin'. |
'builtin'
|
drop_dependencies
|
bool
|
Whether to remove source variables from the output after computing the derived variable. Default: True (only return the derived variable). Set to False to keep source variables alongside the derived variable. |
True
|
Returns:
| Type | Description |
|---|---|
callable
|
The decorator function. |
Examples:
>>> @register_derived(
... variable='wind_speed',
... query={'variable_id': ['u10', 'v10']},
... description='Wind speed at 10m',
... units='m/s'
... )
... def calc_wind_speed(ds):
... import numpy as np
... ds['wind_speed'] = np.sqrt(ds.u10**2 + ds.v10**2)
... ds['wind_speed'].attrs = {'units': 'm/s', 'long_name': 'Wind Speed'}
... return ds
>>> # Keep source variables alongside derived variable
>>> @register_derived(
... variable='temp_range',
... query={'variable_id': ['tasmax', 'tasmin']},
... description='Daily temperature range',
... drop_dependencies=False # Keep tasmax and tasmin
... )
... def calc_temp_range(ds):
... ds['temp_range'] = ds.tasmax - ds.tasmin
... return ds
Notes
The decorated function must: - Accept a single xarray.Dataset argument - Add the derived variable to the dataset - Return the modified dataset - Set appropriate attributes (units, long_name) on the new variable
By default, source variables are dropped to reduce output size. Set drop_dependencies=False to keep them for downstream analysis.
Source code in climakitae/new_core/derived_variables/registry.py
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 | |
register_user_function(name, depends_on, func, description='', units='', query_extras=None, drop_dependencies=True)
Register a user-defined derived variable at runtime.
This function allows users to register custom derived variables without using the decorator syntax. This is useful for dynamic registration or when working interactively.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the derived variable. This is what users will query. |
required |
depends_on
|
list of str
|
Variable IDs that this function requires (e.g., ['tasmax', 'tasmin']). |
required |
func
|
callable
|
Function that takes an xarray.Dataset and returns a modified Dataset with the new variable added. |
required |
description
|
str
|
Human-readable description of the derived variable. |
''
|
units
|
str
|
Expected units of the derived variable. |
''
|
query_extras
|
dict
|
Additional query constraints beyond variable_id (e.g., table_id, experiment_id). |
None
|
drop_dependencies
|
bool
|
Whether to remove source variables from the output after computing the derived variable. Default: True (only return the derived variable). Set to False to keep source variables alongside the derived variable. |
True
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is empty or depends_on is empty. |
TypeError
|
If func is not callable. |
Examples:
>>> def calc_temp_range(ds):
... ds['temp_range'] = ds.tasmax - ds.tasmin
... ds['temp_range'].attrs = {'units': 'K', 'long_name': 'Diurnal Range'}
... return ds
...
>>> register_user_function(
... name='temp_range',
... depends_on=['tasmax', 'tasmin'],
... func=calc_temp_range,
... description='Daily temperature range',
... units='K'
... )
...
>>> # Now query it directly
>>> data = cd.catalog("cadcat").variable("temp_range").get()
Notes
Registration is permanent for the session. Once registered, the variable is available for all subsequent queries until the Python process ends.
Source code in climakitae/new_core/derived_variables/registry.py
443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 | |
list_derived_variables()
List all registered derived variables with their metadata.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary mapping variable names to DerivedVariableInfo objects. |
Examples:
>>> derived_vars = list_derived_variables()
>>> for name, info in derived_vars.items():
... print(f"{name}: depends on {info.depends_on}")
wind_speed: depends on ['u10', 'v10']
relative_humidity: depends on ['t2', 'q2', 'psfc']
Source code in climakitae/new_core/derived_variables/registry.py
is_derived_variable(variable)
Check if a variable name is a registered derived variable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variable
|
str
|
The variable name to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the variable is registered as a derived variable. |
Source code in climakitae/new_core/derived_variables/registry.py
get_derived_variable_info(variable)
Get metadata for a derived variable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variable
|
str
|
The variable name to look up. |
required |
Returns:
| Type | Description |
|---|---|
DerivedVariableInfo or None
|
Metadata for the variable, or None if not found. |
Source code in climakitae/new_core/derived_variables/registry.py
Utilities
Helper utilities for derived variable computations.
Provides runtime helpers for resolving parameter precedence for derived variable calculations (explicit args -> dataset-level overrides -> global defaults).
get_derived_threshold(ds, derived_var_name=None, threshold_k=None, threshold_c=None, threshold_f=None)
Resolve a degree-day temperature based threshold in Kelvin.
Precedence:
1. Explicit function arguments (threshold_k, threshold_c, threshold_f)
2. Per-dataset attribute overrides (see dataset.attrs keys below)
3. Global default DEFAULT_DEGREE_DAY_THRESHOLD_K
Dataset attribute conventions supported (checked in order):
- derived_variable_overrides : dict mapping derived var name -> params dict
e.g. dataset.attrs['derived_variable_overrides'] = {'CDD_wrf': {'threshold_f': 75}}
- Top-level attrs: threshold_k, threshold_c, threshold_f
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset that may contain attribute-based overrides. |
required |
derived_var_name
|
str
|
Name of the derived variable to look up per-variable overrides. |
None
|
threshold_k
|
float
|
Threshold in Kelvin. |
None
|
threshold_c
|
float
|
Threshold in Celsius. |
None
|
threshold_f
|
float
|
Threshold in Fahrenheit. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
Threshold value in Kelvin. |
Source code in climakitae/new_core/derived_variables/utils.py
Built-in Humidity Derivations
Humidity-related derived variables.
This module provides derived variables for humidity calculations including relative humidity, dew point temperature, and specific humidity.
Derived Variables
relative_humidity_2m Relative humidity at 2m from temperature, specific humidity, and pressure. dew_point_2m Dew point temperature at 2m from temperature and relative humidity.
calc_relative_humidity_2m(ds)
Calculate relative humidity at 2m.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2': 2m temperature (K) - 'q2': 2m specific humidity (kg/kg) - 'psfc': Surface pressure (Pa) |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'relative_humidity_2m' variable added (0-100 scale). |
Notes
Uses the approximation: - Saturation vapor pressure: es = 611.2 * exp(17.67 * (T-273.15) / (T-29.65)) - Vapor pressure from specific humidity: e = q * p / (0.622 + 0.378*q) - Relative humidity: RH = 100 * e / es
Source code in climakitae/new_core/derived_variables/builtin/humidity.py
calc_dew_point_2m(ds)
Calculate dew point temperature at 2m.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2': 2m temperature (K) - 'rh': Relative humidity (0-100 scale) |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'dew_point_2m' variable added (K). |
Notes
Uses the Magnus formula approximation for dew point.
Source code in climakitae/new_core/derived_variables/builtin/humidity.py
calc_specific_humidity_2m(ds)
Calculate specific humidity at 2m.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2': 2m temperature (K) - 'rh': Relative humidity (0-100 scale) - 'psfc': Surface pressure (Pa) |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'specific_humidity_2m' variable added (kg/kg). |
Source code in climakitae/new_core/derived_variables/builtin/humidity.py
Built-in Temperature Derivations
Temperature-related derived variables.
This module provides derived variables for temperature calculations including heat index, wind chill, apparent temperature, and degree days.
Derived Variables
heat_index Heat index (feels-like temperature accounting for humidity). wind_chill Wind chill (feels-like temperature accounting for wind). apparent_temperature Apparent temperature combining temperature, humidity, and wind effects. diurnal_temperature_range Daily temperature range (tasmax - tasmin). HDD_wrf Heating degree days from WRF data (base 65°F). CDD_wrf Cooling degree days from WRF data (base 65°F). HDD_loca Heating degree days from LOCA2 data (base 65°F). CDD_loca Cooling degree days from LOCA2 data (base 65°F).
calc_heat_index(ds)
Calculate heat index from temperature and relative humidity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2': 2m temperature (K) - 'rh': Relative humidity (0-100 scale) |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'heat_index' variable added (K). |
Notes
Uses the NOAA/NWS heat index formula. The heat index is only meaningful when temperature is above ~27°C (80°F) and relative humidity is above ~40%. For lower values, the original temperature is returned.
Reference: https://www.weather.gov/media/ffc/ta_htindx.PDF
Source code in climakitae/new_core/derived_variables/builtin/temperature.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
calc_wind_chill(ds)
Calculate wind chill from temperature and wind components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2': 2m temperature (K) - 'u10': 10m U wind component (m/s) - 'v10': 10m V wind component (m/s) |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'wind_chill' variable added (K). |
Notes
Uses the NWS wind chill formula (2001 revision). Wind chill is only calculated when temperature is below 10°C (50°F) and wind speed is above 4.8 km/h (3 mph). Otherwise, the original temperature is returned.
Reference: https://www.weather.gov/media/epz/wxcalc/windChill.pdf
Source code in climakitae/new_core/derived_variables/builtin/temperature.py
calc_diurnal_temperature_range_loca(ds)
Calculate diurnal (daily) temperature range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 'tasmax': Daily maximum temperature (K) - 'tasmin': Daily minimum temperature (K) |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'diurnal_temperature_range_loca' variable added (K). |
Source code in climakitae/new_core/derived_variables/builtin/temperature.py
calc_diurnal_temperature_range_wrf(ds)
Calculate diurnal (daily) temperature range from WRF variables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2max': Daily maximum 2m temperature (K) - 't2min': Daily minimum 2m temperature (K) |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'diurnal_temperature_range_wrf' variable added (K). |
Source code in climakitae/new_core/derived_variables/builtin/temperature.py
calc_hdd_wrf(ds, threshold_k=None, threshold_c=None, threshold_f=None)
Calculate heating degree days from WRF temperature data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2max': Daily maximum 2m temperature (K) - 't2min': Daily minimum 2m temperature (K) |
required |
threshold_k
|
float
|
Threshold for heating degree days in Kelvin. |
None
|
threshold_c
|
float
|
Threshold for heating degree days in Celsius. |
None
|
threshold_f
|
float
|
Threshold for heating degree days in Fahrenheit. |
None
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'HDD_wrf' variable added (K). |
Notes
Heating degree days (HDD) represent the energy demand for heating. Calculated as max(0, threshold - average_temp) where threshold is 65°F (291.48K) and average_temp is (t2max + t2min) / 2.
Source code in climakitae/new_core/derived_variables/builtin/temperature.py
calc_cdd_wrf(ds, threshold_k=None, threshold_c=None, threshold_f=None)
Calculate cooling degree days from WRF temperature data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2max': Daily maximum 2m temperature (K) - 't2min': Daily minimum 2m temperature (K) |
required |
threshold_k
|
float
|
Threshold for cooling degree days in Kelvin. |
None
|
threshold_c
|
float
|
Threshold for cooling degree days in Celsius. |
None
|
threshold_f
|
float
|
Threshold for cooling degree days in Fahrenheit. |
None
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'CDD_wrf' variable added (K). |
Notes
Cooling degree days (CDD) represent the energy demand for cooling. Calculated as max(0, average_temp - threshold) where threshold is 65°F (291.48K) and average_temp is (t2max + t2min) / 2.
Source code in climakitae/new_core/derived_variables/builtin/temperature.py
calc_hdd_loca(ds, threshold_k=None, threshold_c=None, threshold_f=None)
Calculate heating degree days from LOCA2 temperature data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 'tasmax': Daily maximum temperature (K) - 'tasmin': Daily minimum temperature (K) |
required |
threshold_k
|
float
|
Threshold for heating degree days in Kelvin. |
None
|
threshold_c
|
float
|
Threshold for heating degree days in Celsius. |
None
|
threshold_f
|
float
|
Threshold for heating degree days in Fahrenheit. |
None
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'HDD_loca' variable added (K). |
Notes
Heating degree days (HDD) represent the energy demand for heating. Calculated as max(0, threshold - average_temp) where threshold is 65°F (291.48K) and average_temp is (tasmax + tasmin) / 2.
Source code in climakitae/new_core/derived_variables/builtin/temperature.py
calc_cdd_loca(ds, threshold_k=None, threshold_c=None, threshold_f=None)
Calculate cooling degree days from LOCA2 temperature data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 'tasmax': Daily maximum temperature (K) - 'tasmin': Daily minimum temperature (K) |
required |
threshold_k
|
float
|
Threshold for cooling degree days in Kelvin. |
None
|
threshold_c
|
float
|
Threshold for cooling degree days in Celsius. |
None
|
threshold_f
|
float
|
Threshold for cooling degree days in Fahrenheit. |
None
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'CDD_loca' variable added (K). |
Notes
Cooling degree days (CDD) represent the energy demand for cooling. Calculated as max(0, average_temp - threshold) where threshold is 65°F (291.48K) and average_temp is (tasmax + tasmin) / 2.
Source code in climakitae/new_core/derived_variables/builtin/temperature.py
Built-in Wind Derivations
Wind-related derived variables.
This module provides derived variables for wind calculations including wind speed and wind direction from U and V wind components.
Derived Variables
wind_speed_10m Wind speed at 10m computed from u10 and v10 components. wind_direction_10m Wind direction at 10m computed from u10 and v10 components.
calc_wind_speed_10m(ds)
Calculate wind speed at 10m from U and V components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing 'u10' and 'v10' variables. |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'wind_speed_10m' variable added. |
Notes
Wind speed is computed as: sqrt(u10² + v10²)
Source code in climakitae/new_core/derived_variables/builtin/wind.py
calc_wind_direction_10m(ds)
Calculate wind direction at 10m from U and V components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing 'u10' and 'v10' variables. |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'wind_direction_10m' variable added. |
Notes
Wind direction uses meteorological convention: the direction the wind is coming FROM, measured clockwise from north.
Direction = (270 - atan2(v10, u10) * 180/π) mod 360
Source code in climakitae/new_core/derived_variables/builtin/wind.py
Built-in Climate Indices
Fire weather and climate indices derived variables.
This module provides derived variables for fire weather and climate index calculations.
Derived Variables
fosberg_fire_weather_index Fosberg Fire Weather Index from temperature, humidity, and wind.
calc_fosberg_fire_weather_index(ds)
Calculate the Fosberg Fire Weather Index (FFWI).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ds
|
Dataset
|
Dataset containing: - 't2': 2m temperature (K) - 'q2': 2m specific humidity (kg/kg) - 'psfc': Surface pressure (Pa) - 'u10': U-component of wind at 10m (m/s) - 'v10': V-component of wind at 10m (m/s) |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with 'fosberg_fire_weather_index' variable added (0-100 scale). |
References
https://a.atmos.washington.edu/wrfrt/descript/definitions/fosbergindex.html https://www.spc.noaa.gov/exper/firecomp/INFO/fosbinfo.html