Processor Reference

What each processor does, with parameters and edge cases. If you'd rather start from a goal ("clip to a county", "export to Zarr"), use the How-To Guides instead.

Processors transform climate data through a configurable pipeline. Each processor executes in priority order, handling a specific transformation task: temporal subsetting, spatial clipping, format conversion, aggregation, or metadata updates.

Processor Registry

All available processors, listed in execution order (lowest priority first).

The registry key is the string you pass to .processes({...}). It is not always identical to the source filename — for example, the concatenation processor is registered under the key "concat".

Registry key	Priority	Page	Purpose
`filter_unadjusted_models`	0	filter_unadjusted_models	Filter models by a-priori bias-adjustment status (default: enabled for WRF)
`drop_leap_days`	1	drop_leap_days	Remove Feb 29 for calendar alignment
`convert_units`	5	convert_units	Unit conversion for climate variables
`warming_level`	10	warming_level	Subset around global warming-level thresholds
`concat`	50	concatenate	Concatenate datasets along the `sim` (or named) dimension
`bias_adjust_model_to_station`	60	bias_adjust_model_to_station	Quantile-delta-mapping bias correction using HadISD station observations
`clip`	65	clip	Spatial subsetting by boundary, point(s), bbox, station, or shapefile
`convert_to_local_time`	70	convert_to_local_time	Convert UTC to a local timezone
`time_slice`	150	time_slice	Temporal subsetting by date range and optional season
`metric_calc`	7500	metric_calc	Derived metrics, threshold exceedance, and 1-in-X return-period analysis
`update_attributes`	9998	update_attributes	Update dataset/variable attributes
`export`	9999	export	Write to NetCDF, Zarr, CSV, or GeoTIFF

Execution Order

Processors execute in priority order (ascending — lowest priority runs first). Within each workflow, only registered processors with configured parameters execute.

Phase 1: Catalog refinement (priority 0–10)
  ├─ filter_unadjusted_models (0)    — Drop models without a-priori bias adjustment
  ├─ drop_leap_days (1)              — Calendar normalization
  ├─ convert_units (5)               — Unit standardization
  └─ warming_level (10)              — Global-warming-level window selection

Phase 2: Aggregation, correction, spatial (priority 50–70)
  ├─ concat (50)                     — Concatenate along sim (or other) dimension
  ├─ bias_adjust_model_to_station (60) — Station-based bias correction (WRF only)
  ├─ clip (65)                       — Geographic filtering
  └─ convert_to_local_time (70)      — Timezone conversion

Phase 3: Temporal subsetting & metrics (priority 150–7500)
  ├─ time_slice (150)                — Calendar date / season filtering
  └─ metric_calc (7500)              — Derived metrics & extreme-value analysis

Phase 4: Finalization (priority 9998–9999)
  ├─ update_attributes (9998)        — Metadata finalization
  └─ export (9999)                   — File writing & archival

Why does spatial clipping run before time slicing?

clip (priority 65) runs before time_slice (priority 150) so that spatial subsetting reduces the data volume that subsequent steps must read. Reducing spatial extent first is usually the dominant performance win for gridded queries.

Architecture

Each processor implements three core methods:

__init__(value: Dict) — Parse and validate parameters
execute(data: Dataset, context: Dict) → Dataset — Apply transformation, handle branching (lists, dicts, single datasets)
update_context(context: Dict) → None — Record operation in dataset attributes

The processor registry uses decorators:

@register_processor(key="processor_name", priority=N)
class ProcessorName(DataProcessor):
    def execute(self, data, context):
        # Transform data
        return data

    def update_context(self, context):
        # Record metadata
        pass

How to Read These Diagrams

Each processor page includes:

Algorithm flowchart — Execution flow with decision points
Code links — Click boxes to jump to source code
Parameter table — Valid inputs and constraints
Usage example — How to invoke via ClimateData.processes()

Adding a New Processor

See Architecture → Extension Guide → Add a Processor for step-by-step instructions. Summary:

Create climakitae/new_core/processors/my_processor.py inheriting from DataProcessor
Implement execute() and update_context()
Decorate with @register_processor(key="my_processor", priority=N)
Import in __init__.py to register at import time
Add documentation page docs-mkdocs/climate-data-interface/processors/my_processor.md