Processor: Concat

Registry key: concat | Priority: 50 | Category: Data Assembly

Registry key vs filename

The processor's source file is concatenate.py and its class is Concat, but it is registered under the key "concat". Always use "concat" (not "concatenate") in .processes({...}).

Merge multiple climate datasets returned from a single catalog query by concatenating them along a new dimension (default: sim). This is the standard way to assemble a multi-model ensemble or to combine historical + ssp* time series into a single contiguous record.

The processor is invoked automatically when a query produces multiple datasets (e.g. multiple source_id values) and the user includes "concat" in .processes({...}). It dispatches to _execute_gridded_concat or _execute_hdp_concat based on the catalog type detected in the processing context.

Algorithm

flowchart TD
    Start([Input: dict / list of Datasets]) --> CheckType{Single<br/>Dataset?}
    CheckType -->|Yes| Pass[Return unchanged]
    CheckType -->|No| Route{catalog == hdp?}
    Route -->|Yes| HDP[_execute_hdp_concat<br/>concat along station]
    Route -->|No| Gridded[_execute_gridded_concat<br/>match result: dict / iterable<br/>concat along sim with source_id labels]
    HDP --> Update[update_context]
    Gridded --> Update
    Update --> End([Output: single Dataset])

    click Start "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/concatenate.py#L65" "execute"
    click HDP "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/concatenate.py#L105" "_execute_hdp_concat"
    click Gridded "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/concatenate.py#L174" "_execute_gridded_concat (match result @228)"
    click Update "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/concatenate.py#L487" "update_context"

Code References

Method	Lines	Purpose
`__init__`	49\u201363	Store `value` (defaults to `"sim"` if non-string)
`execute`	65\u2013103	Single-input passthrough; route by `context["catalog"]`
`_execute_hdp_concat`	105\u2013172	HDP-station concat path
`_execute_gridded_concat`	174\u2013460	Gridded concat with `match result` (line 228) over dict / iterable
`_align_time_dim`	462\u2013485	Align time coordinates before concat
`update_context`	487\u2013513	Record concat dim and source_id list in context
`set_data_accessor`	515\u2013517	Receive `DataCatalog` reference

Parameter shape

The processor takes a single string: the name of the new dimension. The default is "sim", which is what almost every multi-model workflow wants.

.processes({"concat": "sim"})

Field	Type	Description
`value`	`str`	Name of the new dimension. Defaults to `"sim"` if a non-string is passed.

Examples

Multi-model ensemble (gridded catalog)

from climakitae.new_core.user_interface import ClimateData

ensemble = (ClimateData()
    .catalog("cadcat")
    .activity_id("LOCA2")
    .variable("tasmax")
    .experiment_id(["historical", "ssp370"])
    .table_id("day")
    .grid_label("d03")
    .processes({
        "time_slice": ("2000-01-01", "2050-12-31"),
        "clip": "Los Angeles",
        "concat": "sim",
    })
    .get())

# ensemble has a 'sim' dimension labeled by source_id
print(ensemble.sim.values)

HDP station catalog

For the hdp catalog, concat produces a station-dimension stack:

stations = (ClimateData()
    .catalog("hdp")
    .network_id("hadisd")
    .processes({"concat": "station"})
    .get())

Behavior notes

Input must be a collection (dict or iterable) of datasets/dataarrays. A single Dataset/DataArray is returned unchanged.
For the gridded path, each input dataset is expected to carry a source_id attribute, which becomes its label along the new dimension.
The HDP path uses a simpler concatenation that does not perform per-source attribute extraction.
Runs at priority 50 — after early refinement (filter, leap days, units, warming level) and before spatial clipping and time slicing, so the merged ensemble flows through the rest of the pipeline as a single object.