Processor: Export

Registry key: export | Priority: 9999 | Category: I/O & Archival

Write climate data to disk in NetCDF, Zarr, or CSV. Handles gridded datasets, multi-point clip results (closest_cell dimension), and collections (lists/tuples/dicts of single-point datasets) with optional location-based filenames and S3 storage for Zarr.

Algorithm

Export runs at priority 9999 (last). It dispatches by export_method first, then by data structure.

flowchart TD
    Init([__init__: parse value dict]) --> Validate["_validate_parameters<br/>(format ∈ {zarr, netcdf, csv}; mode ∈ {local, s3};<br/>s3 requires zarr; export_method valid)"]
    Validate --> Start([execute])
    Start --> MethodCheck{export_method}

    MethodCheck -->|none| ReturnNoop["print message,<br/>return data unchanged"]
    MethodCheck -->|raw / calculate / both + dict result| HandleDict["_handle_dict_result<br/>(raw_data / calc_data keys)"]
    MethodCheck -->|raw / calculate / both + non-dict| HandleSel["_handle_selective_export<br/>(uses _determine_data_type)"]
    MethodCheck -->|data / skip_existing| ExportData["_export_data"]

    ExportData --> MatchResult{match result}
    MatchResult -->|Dataset / DataArray| HasCC{closest_cell dim<br/>and separated?}
    HasCC -->|Yes| SplitCC["_split_and_export_closest_cells"]
    HasCC -->|No| ExportSingle1["export_single"]
    MatchResult -->|dict| LoopDict["For each value:<br/>list/tuple → _export_collection,<br/>else → export_single"]
    MatchResult -->|list / tuple| ExportColl["_export_collection"]
    MatchResult -->|other| RaiseType["raise TypeError"]

    SplitCC --> ExportSingleN["export_single per slice<br/>(filename via _generate_filename)"]
    ExportColl --> ExportSingleN
    LoopDict --> ExportSingleN
    ExportSingle1 --> WriteOne
    ExportSingleN --> WriteOne

    WriteOne["export_single:<br/>1. _generate_filename + extension<br/>2. skip_existing check / unique filename<br/>3. _clean_attrs_for_netcdf"]
    WriteOne --> FormatMatch{match req_format}
    FormatMatch -->|zarr| Zarr["_export_to_zarr<br/>(local or s3)"]
    FormatMatch -->|netcdf| NCDF["_export_to_netcdf"]
    FormatMatch -->|csv| CSV["_export_to_csv (gzip)"]

    Zarr --> Done
    NCDF --> Done
    CSV --> Done
    HandleDict --> Done
    HandleSel --> Done
    Done([Return original data unchanged — chainable])
    ReturnNoop --> Done

    click Init "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L197" "__init__"
    click Validate "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L238" "_validate_parameters"
    click Start "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L326" "execute"
    click ExportData "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L394" "_export_data dispatcher"
    click MatchResult "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L432" "match result"
    click SplitCC "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L820" "_split_and_export_closest_cells"
    click ExportColl "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L457" "_export_collection"
    click HandleDict "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L573" "_handle_dict_result"
    click HandleSel "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L590" "_handle_selective_export"
    click WriteOne "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L1064" "export_single"
    click FormatMatch "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L1103" "match req_format"

Data Handling Modes

Mode 1: Gridded Datasets ⚠️ Important

Single xr.Dataset/xr.DataArray with lat and lon coordinate dimensions containing multiple values (e.g., shape (time, lat, lon)).

Key Behavior: - Options separated and location_based_naming are silently ignored - Always exports to a single file (cannot split gridded data by location) - Single filename used with extension based on format

Example Input:

# Dataset from clip with single region or full state
ds.dims  # {'time': 8760, 'lat': 150, 'lon': 100, 'sim': 5}

Parameters Effective for This Mode: - filename: Used for single output file - file_format: Determines output format - export_method: Controls return value - mode: For Zarr only

Parameters Ignored for This Mode: - separated (always False, cannot split gridded data) - location_based_naming (no point coordinates to include)

Mode 2: Multi-Point Clip Results

Single xr.Dataset/xr.DataArray with closest_cell dimension (output from clipping to multiple lat/lon points). The closest_cell dimension represents individual target points.

Key Behavior: - When separated=True: Data splits along closest_cell dimension - Each slice becomes separate file with index or location-based name - When separated=False: All points in single file with closest_cell dimension preserved

Example Input:

# Dataset from clip with multiple points
ds.dims  # {'time': 8760, 'closest_cell': 3, 'sim': 5}
# 3 points: LA, SF, SD

With separated=True + location_based_naming=True:

myfile_34-05N_118-25W.nc  # LA
myfile_37-77N_122-42W.nc  # SF
myfile_32-72N_117-16W.nc  # SD

With separated=True + location_based_naming=False:

myfile_0.nc  # LA
myfile_1.nc  # SF
myfile_2.nc  # SD

With separated=False:

myfile.nc  # All 3 points, closest_cell dimension preserved

Mode 3: Point Collections

A list or dict of xr.Dataset/xr.DataArray objects, where each item represents a single spatial point (scalar lat/lon coordinates with size 1).

Typical Sources: - Output from batch_select() with return_data_and_metadata=False - Lists returned by multi-point clipping with separated=True - Custom collections of point time series

Example Input:

# List of 3 datasets, each a single point
data = [
    ds_la,  # shape (time: 8760, sim: 5)
    ds_sf,  # shape (time: 8760, sim: 5)
    ds_sd   # shape (time: 8760, sim: 5)
]

With separated=True + location_based_naming=True: - Each dataset exported separately - Filenames use lat/lon from dataset coordinates

myfile_34-05N_118-25W.nc  # LA
myfile_37-77N_122-42W.nc  # SF
myfile_32-72N_117-16W.nc  # SD

With separated=True + location_based_naming=False:

myfile_0.nc
myfile_1.nc
myfile_2.nc

With separated=False: - Each dataset exports with base filename - Duplicate filenames get _1, _2, _3 suffixes

Export Formats

Format	Use Case	Compression	Local/S3	Notes
NetCDF	Climate data standard, archival	netCDF4 / zlib	Local only	Default; preserves CF metadata
Zarr	Cloud / chunked I/O	Internal	Both (S3 requires `mode="s3"`)	Path: `s3://bucket/...zarr`
CSV	Tabular / spreadsheets	gzip (`.csv.gz`)	Local only	Slow for large datasets

The Export processor does not support GeoTIFF. For GIS raster output, use rioxarray directly on the returned xr.Dataset after .get().

Format-Specific Notes

NetCDF (_export_to_netcdf) - Attributes are sanitized via _clean_attrs_for_netcdf before writing. - Best for archival, ncdump-readable.

Zarr (_export_to_zarr) - mode="local" (default) writes to current working directory. - mode="s3" writes to S3 — requires bucket configured outside the processor; only Zarr is allowed when mode="s3". - Attributes also sanitized prior to write to keep parity with the NetCDF path.

CSV (_export_to_csv) - gzip-compressed (.csv.gz). - Practical only for small / point datasets.

Parameters

Parameter	Type	Mode(s)	Description	Constraints
`filename`	str	All	Base filename (no extension)	Default: `"dataexport"`
`file_format`	str	All	Output format	`"NetCDF"`, `"Zarr"`, `"CSV"` (case-insensitive; aliases inferred)
`mode`	str	All	Storage destination	`"local"` or `"s3"`; `"s3"` requires `file_format="Zarr"`
`separated`	bool	2, 3	Export items to separate files	Ignored for gridded data without `closest_cell`
`location_based_naming`	bool	2, 3	Use lat/lon in filenames	Effective when items have scalar lat/lon
`filename_template`	str / None	All	Custom filename template	Optional
`export_method`	str	All	Return / handler routing	`"data"`, `"skip_existing"`, `"raw"`, `"calculate"`, `"both"`, `"none"`
`raw_filename`	str / None	—	Custom name for raw export	Used by `_handle_dict_result` / `_handle_selective_export`
`calc_filename`	str / None	—	Custom name for calculated export	Used by `_handle_dict_result` / `_handle_selective_export`
`fail_on_error`	bool	All	Raise vs warn on write failure	Default: bool

Code References

Method	Lines	Purpose
`__init__`	197–237	Parse `value` dict, set defaults
`_validate_parameters`	238–325	Validate format / mode / export_method / types
`execute`	326–393	Route by `export_method`; returns input unchanged
`_export_data`	394–455	`match result` dispatcher (Dataset/DataArray, dict, list/tuple)
`_export_collection`	457–511	List/tuple handler honoring `separated` and `location_based_naming`
`_export_single_from_collection`	513–571	Per-item write inside `_export_collection`
`_handle_dict_result`	573–588	`raw_data` / `calc_data` dict (e.g., `cava_data`)
`_handle_selective_export`	590–618	Selective raw/calculate/both for non-dict inputs
`_determine_data_type`	620–646	Classify input as raw vs calculated
`_export_with_suffix`	648–674	Append suffix (`_raw`, `_calc`) and write
`update_context`	676–696	Record export metadata in context
`_clean_attrs_for_netcdf`	702–761	Sanitize attrs for NetCDF/Zarr serialization
`_is_single_point_data`	763–785	Detect scalar lat/lon
`_has_closest_cell_dimension`	787–818	Multi-point clip detection
`_split_and_export_closest_cells`	820–902	Split along `closest_cell` and write each slice
`_extract_point_coordinates`	904–964	Pull (lat, lon) for filename suffix
`_generate_filename`	966–1030	Build base filename (location_based or template)
`_get_unique_filename`	1032–1062	Add `_1`, `_2`, ... when target exists
`export_single`	1064–1127	`match req_format` and call format writer
.table_id("day")
.grid_label("d03")
.processes({
"time_slice": ("2015-01-01", "2015-12-31"),
"clip": (37.77, -122.42), # Single point
"export": {
"filename": "sf_daily_max_temp",
"file_format": "CSV"
}
})
.get())

Writes: sf_daily_max_temp.csv

Columns: time, t2max (one row per day)

## Implementation Details

### Format-Specific Behavior

**NetCDF**: Uses xarray `.to_netcdf()` with CF conventions and compression

```python
data.to_netcdf(filename, engine="netcdf4", encoding={var: {"zlib": True} for var in data.data_vars})

Zarr: Chunked cloud-optimized format (local or S3)

if mode == "s3":
    data.to_zarr(f"s3://bucket/{filename}.zarr")
else:
    data.to_zarr(f"./{filename}.zarr")

CSV: Flattens spatial dims; requires scalar or point data

data.to_csv(filename)  # Works for time series at single points

Skip Existing

With export_method="skip_existing", processor checks if file exists before writing:

if os.path.exists(filename):
    return None  # Don't overwrite
else:
    export_and_return_path(data, filename)

Location-Based Naming

Coordinates are formatted as compass directions:

# (37.7749, -122.4194) → "37-77N_122-42W"
lat_str = f"{abs(lat):.2f}{'NS'[lat < 0]}"
lon_str = f"{abs(lon):.2f}{'EW'[lon < 0]}"
filename = f"{base}_{lat_str}_{lon_str}.{ext}"

Common Patterns

Multi-Format Export

data = (ClimateData()
    .catalog("cadcat")
    .activity_id("WRF")
    .variable("t2max")
    .table_id("day")
    .grid_label("d03")
    .processes({
        "time_slice": ("2015-01-01", "2015-12-31"),
        "clip": "Alameda"
    })
    .get())

# Export to multiple formats
data.to_netcdf("alameda_2015.nc")
data.to_zarr("alameda_2015.zarr")
data.to_csv("alameda_2015.csv")

Batch Export Loop

counties = ["Alameda", "Contra Costa", "Santa Clara"]
for county in counties:
    (ClimateData()
        .catalog("cadcat")
        .activity_id("WRF")
        .variable("t2max")
        .table_id("day")
        .grid_label("d03")
        .processes({
            "clip": county,
            "export": {
                "filename": f"{county.lower()}_2015",
                "file_format": "NetCDF"
            }
        })
        .get())

Processor: Export

Algorithm

Data Handling Modes

Mode 1: Gridded Datasets ⚠️ Important

Mode 2: Multi-Point Clip Results

Mode 3: Point Collections

Export Formats

Format-Specific Notes

Parameters

Code References

Writes: sf_daily_max_temp.csv

Columns: time, t2max (one row per day)

Skip Existing

Location-Based Naming

Common Patterns

Multi-Format Export

Batch Export Loop

See Also