Processor: Concat
Registry key: concat | Priority: 50 | Category: Data Assembly
Registry key vs filename
The processor's source file is concatenate.py and its class is Concat, but it
is registered under the key "concat". Always use "concat" (not
"concatenate") in .processes({...}).
Merge multiple climate datasets returned from a single catalog query by concatenating them along a new dimension (default: sim). This is the standard way to assemble a multi-model ensemble or to combine historical + ssp* time series into a single contiguous record.
The processor is invoked automatically when a query produces multiple datasets (e.g. multiple source_id values) and the user includes "concat" in .processes({...}). It dispatches to _execute_gridded_concat or _execute_hdp_concat based on the catalog type detected in the processing context.
Algorithm
flowchart TD
Start([Input: dict / list of Datasets]) --> CheckType{Single<br/>Dataset?}
CheckType -->|Yes| Pass[Return unchanged]
CheckType -->|No| Route{catalog == hdp?}
Route -->|Yes| HDP[_execute_hdp_concat<br/>concat along station]
Route -->|No| Gridded[_execute_gridded_concat<br/>match result: dict / iterable<br/>concat along sim with source_id labels]
HDP --> Update[update_context]
Gridded --> Update
Update --> End([Output: single Dataset])
click Start "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/concatenate.py#L65" "execute"
click HDP "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/concatenate.py#L105" "_execute_hdp_concat"
click Gridded "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/concatenate.py#L174" "_execute_gridded_concat (match result @228)"
click Update "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/concatenate.py#L487" "update_context"
Code References
| Method | Lines | Purpose |
|---|---|---|
__init__ |
49\u201363 | Store value (defaults to "sim" if non-string) |
execute |
65\u2013103 | Single-input passthrough; route by context["catalog"] |
_execute_hdp_concat |
105\u2013172 | HDP-station concat path |
_execute_gridded_concat |
174\u2013460 | Gridded concat with match result (line 228) over dict / iterable |
_align_time_dim |
462\u2013485 | Align time coordinates before concat |
update_context |
487\u2013513 | Record concat dim and source_id list in context |
set_data_accessor |
515\u2013517 | Receive DataCatalog reference |
Parameter shape
The processor takes a single string: the name of the new dimension. The default is "sim", which is what almost every multi-model workflow wants.
| Field | Type | Description |
|---|---|---|
value |
str |
Name of the new dimension. Defaults to "sim" if a non-string is passed. |
Examples
Multi-model ensemble (gridded catalog)
from climakitae.new_core.user_interface import ClimateData
ensemble = (ClimateData()
.catalog("cadcat")
.activity_id("LOCA2")
.variable("tasmax")
.experiment_id(["historical", "ssp370"])
.table_id("day")
.grid_label("d03")
.processes({
"time_slice": ("2000-01-01", "2050-12-31"),
"clip": "Los Angeles",
"concat": "sim",
})
.get())
# ensemble has a 'sim' dimension labeled by source_id
print(ensemble.sim.values)
HDP station catalog
For the hdp catalog, concat produces a station-dimension stack:
stations = (ClimateData()
.catalog("hdp")
.network_id("hadisd")
.processes({"concat": "station"})
.get())
Behavior notes
- Input must be a collection (dict or iterable) of datasets/dataarrays. A single Dataset/DataArray is returned unchanged.
- For the gridded path, each input dataset is expected to carry a
source_idattribute, which becomes its label along the new dimension. - The HDP path uses a simpler concatenation that does not perform per-source attribute extraction.
- Runs at priority 50 — after early refinement (filter, leap days, units, warming level) and before spatial clipping and time slicing, so the merged ensemble flows through the rest of the pipeline as a single object.