Processor: MetricCalc

Registry key: metric_calc | Priority: 7500 | Category: Analysis & Derived Variables

Reduce a dataset along one or more dimensions using a basic statistic (min / max / mean / median / sum), optionally compute percentiles alongside it, count threshold exceedances, or fit extreme-value distributions to estimate return periods.

Three calculation modes

__init__ reads the value dict and chooses a mode at execute time based on which sub-config is present:

Mode	Triggered by	Output
Basic / percentiles	neither `thresholds` nor `one_in_x`	reduced Dataset/DataArray; if both metric and percentiles are requested for a Dataset, variables are renamed `<var>_p<N>` and `<var>_<metric>`
Threshold exceedance	`thresholds={...}`	counts of qualifying timesteps per period
1-in-X extreme value	`one_in_x={...}`	`return_values` and `p_values` (KS goodness-of-fit)

The earlier doc used the key one_in_x_config. The actual key is one_in_x (__init__ calls value.get("one_in_x", UNSET)). Earlier doc also listed metrics like hdd_cdd, heat_index, effective_temp, noaa_heat_index — those live in climakitae.tools.indices / derived_variables, not in this processor. The supported metric values here are exactly min, max, mean (default), median, sum.

Algorithm

flowchart TD
    Init([__init__]) --> ParseValue{which sub-config?}
    ParseValue -->|one_in_x| SetupOIX[_setup_one_in_x_parameters]
    ParseValue -->|thresholds| SetupTh[_setup_threshold_parameters]
    ParseValue -->|neither| Basic[Basic metric/percentiles only]

    SetupOIX --> Exec
    SetupTh --> Exec
    Basic --> Exec

    Exec([execute]) --> PickFn{Which fn?}
    PickFn -->|thresholds set| ThFn[_calculate_threshold_single]
    PickFn -->|one_in_x set| OIXFn[_calculate_one_in_x_single]
    PickFn -->|else| BasicFn[_calculate_metrics_single]

    ThFn --> Dispatch[Dispatch over result:<br/>Dataset/DataArray, dict, list, tuple]
    OIXFn --> Dispatch
    BasicFn --> Dispatch

    Dispatch --> UpdateCtx[update_context]
    UpdateCtx --> End([Output])

    click Init "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L145" "__init__"
    click SetupOIX "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L191" "_setup_one_in_x_parameters"
    click SetupTh "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L262" "_setup_threshold_parameters"
    click Exec "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L305" "execute"
    click BasicFn "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L372" "_calculate_metrics_single"
    click ThFn "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L512" "_calculate_threshold_single"
    click OIXFn "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L595" "_calculate_one_in_x_single"
    click UpdateCtx "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L1387" "update_context"

Basic / percentiles flow

flowchart TD
    A([_calculate_metrics_single]) --> B[Resolve dim list;<br/>warn and skip if dim missing]
    B --> C{percentiles set?}
    C -->|Yes| D[Compute quantiles at percentiles/100;<br/>rename coord: quantile to percentile]
    C -->|No| E{percentiles_only?}
    D --> E
    E -->|Yes| F([Return percentile result])
    E -->|No| G[Reduce by self.metric:<br/>min / max / mean / median / sum]
    G --> H{Combine percentiles + metric?}
    H -->|Dataset + both| I[Rename vars: var_pN and var_metric]
    H -->|DataArray + both| J[concat along statistic dim]
    H -->|Single| K([Return as-is])

Threshold flow (`_calculate_threshold_single`)

flowchart TD
    Start([_calculate_threshold_single]) --> Compare[Mask timesteps above or below threshold_value]
    Compare --> DurCheck{duration set?}
    DurCheck -->|Yes| Consec[Filter for consecutive runs<br/>of at least duration timesteps]
    DurCheck -->|No| Resamp
    Consec --> Resamp[Resample by period year or month;<br/>count qualifying timesteps]
    Resamp --> End([Output: count per period])

    click Start "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L512" "_calculate_threshold_single"

thresholds config keys (validated in _setup_threshold_parameters):

Key	Type	Default	Notes
`threshold_value`	float	required	Comparison value (in data units).
`threshold_direction`	`"above"` / `"below"`	required
`period`	`(int, str)`	`(1, "year")`	Aggregation period; unit must be `"year"` or `"month"`.
`duration`	optional	`UNSET`	Minimum consecutive timesteps qualifying as an event.

Cannot be combined with metric, percentiles, or one_in_x.

1-in-X flow (`_calculate_one_in_x_single`)

flowchart TD
    Start([_calculate_one_in_x_single]) --> MultiVar{Dataset with\nmultiple variables?}
    MultiVar -->|Yes| Loop[Recurse once per variable;\nrename results with var prefix;\nxr.merge all]
    MultiVar -->|No| SimCheck{sim dim present?}
    Loop --> Done([Return merged Dataset])
    SimCheck -->|No| ErrSim[raise ValueError]
    SimCheck -->|Yes| MemCheck{Dask array size?}

    MemCheck -->|small| Load[Load into memory]
    MemCheck -->|medium or large| Keep[Keep as Dask]
    Load --> TimeCheck
    Keep --> TimeCheck

    TimeCheck{time dim present?} -->|No| DummyTime[add_dummy_time_to_wl\nset frequency attr]
    TimeCheck -->|Yes| FreqInfer[Infer frequency from\ntime delta if missing]
    DummyTime --> Preproc
    FreqInfer --> Preproc

    Preproc[_preprocess_variable_for_one_in_x] --> Vectorized

    Vectorized([_calculate_one_in_x_vectorized]) --> AdaptBatch[_calculate_adaptive_batch_size\nbased on array shape and memory]
    AdaptBatch --> SplitSims[Split sim indices into\nsequential batches]
    SplitSims --> BatchLoop[For each sim batch]

    BatchLoop --> SpatialCheck{many spatial\npoints?}
    SpatialCheck -->|Yes| SpatBatch[_fit_with_early_spatial_batching\n100 points at a time]
    SpatialCheck -->|No| BlockMax[_get_block_maxima_optimized\nextract block extrema]
    SpatBatch --> FitVec
    BlockMax --> FitVec[_fit_distributions_vectorized\nfit GEV / gamma / genpareto\nper pixel per sim]

    FitVec --> KSCheck{goodness_of_fit_test?}
    KSCheck -->|Yes| KS[KS test on residuals;\ncompute p_values]
    KSCheck -->|No| Assemble
    KS --> Assemble[Assemble return_values\nand p_values as DataArrays]
    Assemble --> GC[gc.collect after batch]
    GC --> BatchLoop

    BatchLoop -->|all batches done| Concat[xr.concat batch results\nalong sim dim]
    Concat --> End([Output: Dataset with\nreturn_values and p_values])

    click Start "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L595" "_calculate_one_in_x_single"
    click Preproc "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L1348" "_preprocess_variable_for_one_in_x"
    click Vectorized "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L850" "_calculate_one_in_x_vectorized"
    click AdaptBatch "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L929" "_calculate_adaptive_batch_size"
    click SpatBatch "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L1216" "_fit_with_early_spatial_batching"
    click BlockMax "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L1039" "_process_simulation_batch"
    click FitVec "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/metric_calc.py#L1167" "_fit_distributions_vectorized"

one_in_x config keys (validated in _setup_one_in_x_parameters):

Key	Type	Default	Notes
`return_periods`	array-like	one of these	Periods to compute (mutually exclusive with `return_values`).
`return_values`	array-like	one of these	Pre-set values to compute exceedance probabilities for.
`distribution`	str	`"gev"`	`"gev"`, `"gamma"`, `"genpareto"`.
`extremes_type`	`"max"` / `"min"`	`"max"`
`event_duration`	`(int, str)`	`(1, "day")`	Block duration.
`grouped_duration`	optional	`UNSET`
`block_size`	int	`1`
`goodness_of_fit_test`	bool	`True`	KS test on residuals.
`print_goodness_of_fit`	bool	`True`
`variable_preprocessing`	dict	`{}`	Preprocessing config.

Internally the workflow extracts block extrema, fits the chosen distribution, and produces return_values plus p_values (KS p-value) DataArrays. Heavy lifting is in the vectorized helpers listed under Code References.

Top-level parameters (basic mode)

Key	Type	Default	Description
`metric`	str	`"mean"`	`"min"`, `"max"`, `"mean"`, `"median"`, `"sum"`.
`percentiles`	list[float]	`UNSET`	Percentile values 0–100; output stored on coord `percentile`.
`percentiles_only`	bool	`False`	Skip the metric reduction.
`dim` (or `dims`)	str / list[str]	`"time"`	Dimensions to reduce over. Missing dims are filtered with a warning.
`keepdims`	bool	`False`	(Reserved; passed through where supported.)
`skipna`	bool	`True`	Pass to underlying xarray reductions.

Examples

Basic mean + percentiles

data = (ClimateData()
    .catalog("cadcat").activity_id("WRF").institution_id("UCLA")
    .variable("t2max").table_id("day").grid_label("d03")
    .processes({
        "time_slice": ("2015-01-01", "2015-12-31"),
        "metric_calc": {
            "metric": "mean",
            "percentiles": [10, 50, 90],
            "dim": "time",
        },
    })
    .get())
# Dataset variables: t2max_p10, t2max_p50, t2max_p90, t2max_mean

Threshold exceedance (annual count of days above 35 °C, 3-day events)

.processes({
    "metric_calc": {
        "thresholds": {
            "threshold_value": 35.0,
            "threshold_direction": "above",
            "period": (1, "year"),
            "duration": 3,
        }
    }
})

1-in-X return periods

.processes({
    "metric_calc": {
        "one_in_x": {
            "return_periods": [2, 5, 10, 20, 50, 100],
            "distribution": "gev",
            "extremes_type": "max",
            "event_duration": (1, "year"),
            "block_size": 1,
        }
    }
})

Code References

Method	Lines	Purpose
`__init__`	145–189	Parse top-level + dispatch sub-config setup
`_setup_one_in_x_parameters`	191–260	Validate periods/values, set distribution & defaults
`_setup_threshold_parameters`	262–303	Validate threshold dict, normalize period tuple
`execute`	305–370	Pick `process_fn`, dispatch over result type
`_calculate_metrics_single`	372–510	Quantile + min/max/mean/median/sum reductions
`_calculate_threshold_single`	512–593	Threshold mask, optional duration filter, resample-sum
`_calculate_one_in_x_single`	595–720	Drives 1-in-X analysis per simulation
`_fit_return_variable_1d`	722–848	Per-pixel distribution fit
`_calculate_one_in_x_vectorized`	850–927	Vectorized fit pathway
`_calculate_adaptive_batch_size`	929–1037	Pick spatial batch size from dataset shape
`_process_simulation_batch`	1039–1165	Run one batch through the fit pipeline
`_fit_distributions_vectorized`	1167–1214	NumPy-vectorized fitting
`_fit_with_early_spatial_batching`	1216–1346	Memory-aware spatial batching
`_preprocess_variable_for_one_in_x`	1348–1385	Apply user-supplied preprocessing
`update_context`	1387–	Tag `new_attrs` with computed metric metadata