Processor: Export
Registry key: export | Priority: 9999 | Category: I/O & Archival
Write climate data to disk in NetCDF, Zarr, or CSV. Handles gridded datasets, multi-point clip results (closest_cell dimension), and collections (lists/tuples/dicts of single-point datasets) with optional location-based filenames and S3 storage for Zarr.
Algorithm
Export runs at priority 9999 (last). It dispatches by export_method first, then by data structure.
flowchart TD
Init([__init__: parse value dict]) --> Validate["_validate_parameters<br/>(format ∈ {zarr, netcdf, csv}; mode ∈ {local, s3};<br/>s3 requires zarr; export_method valid)"]
Validate --> Start([execute])
Start --> MethodCheck{export_method}
MethodCheck -->|none| ReturnNoop["print message,<br/>return data unchanged"]
MethodCheck -->|raw / calculate / both + dict result| HandleDict["_handle_dict_result<br/>(raw_data / calc_data keys)"]
MethodCheck -->|raw / calculate / both + non-dict| HandleSel["_handle_selective_export<br/>(uses _determine_data_type)"]
MethodCheck -->|data / skip_existing| ExportData["_export_data"]
ExportData --> MatchResult{match result}
MatchResult -->|Dataset / DataArray| HasCC{closest_cell dim<br/>and separated?}
HasCC -->|Yes| SplitCC["_split_and_export_closest_cells"]
HasCC -->|No| ExportSingle1["export_single"]
MatchResult -->|dict| LoopDict["For each value:<br/>list/tuple → _export_collection,<br/>else → export_single"]
MatchResult -->|list / tuple| ExportColl["_export_collection"]
MatchResult -->|other| RaiseType["raise TypeError"]
SplitCC --> ExportSingleN["export_single per slice<br/>(filename via _generate_filename)"]
ExportColl --> ExportSingleN
LoopDict --> ExportSingleN
ExportSingle1 --> WriteOne
ExportSingleN --> WriteOne
WriteOne["export_single:<br/>1. _generate_filename + extension<br/>2. skip_existing check / unique filename<br/>3. _clean_attrs_for_netcdf"]
WriteOne --> FormatMatch{match req_format}
FormatMatch -->|zarr| Zarr["_export_to_zarr<br/>(local or s3)"]
FormatMatch -->|netcdf| NCDF["_export_to_netcdf"]
FormatMatch -->|csv| CSV["_export_to_csv (gzip)"]
Zarr --> Done
NCDF --> Done
CSV --> Done
HandleDict --> Done
HandleSel --> Done
Done([Return original data unchanged — chainable])
ReturnNoop --> Done
click Init "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L197" "__init__"
click Validate "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L238" "_validate_parameters"
click Start "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L326" "execute"
click ExportData "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L394" "_export_data dispatcher"
click MatchResult "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L432" "match result"
click SplitCC "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L820" "_split_and_export_closest_cells"
click ExportColl "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L457" "_export_collection"
click HandleDict "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L573" "_handle_dict_result"
click HandleSel "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L590" "_handle_selective_export"
click WriteOne "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L1064" "export_single"
click FormatMatch "https://github.com/cal-adapt/climakitae/blob/main/climakitae/new_core/processors/export.py#L1103" "match req_format"
Data Handling Modes
Mode 1: Gridded Datasets ⚠️ Important
Single xr.Dataset/xr.DataArray with lat and lon coordinate dimensions containing multiple values (e.g., shape (time, lat, lon)).
Key Behavior:
- Options separated and location_based_naming are silently ignored
- Always exports to a single file (cannot split gridded data by location)
- Single filename used with extension based on format
Example Input:
# Dataset from clip with single region or full state
ds.dims # {'time': 8760, 'lat': 150, 'lon': 100, 'sim': 5}
Parameters Effective for This Mode:
- filename: Used for single output file
- file_format: Determines output format
- export_method: Controls return value
- mode: For Zarr only
Parameters Ignored for This Mode:
- separated (always False, cannot split gridded data)
- location_based_naming (no point coordinates to include)
Mode 2: Multi-Point Clip Results
Single xr.Dataset/xr.DataArray with closest_cell dimension (output from clipping to multiple lat/lon points). The closest_cell dimension represents individual target points.
Key Behavior:
- When separated=True: Data splits along closest_cell dimension
- Each slice becomes separate file with index or location-based name
- When separated=False: All points in single file with closest_cell dimension preserved
Example Input:
# Dataset from clip with multiple points
ds.dims # {'time': 8760, 'closest_cell': 3, 'sim': 5}
# 3 points: LA, SF, SD
With separated=True + location_based_naming=True:
With separated=True + location_based_naming=False:
With separated=False:
Mode 3: Point Collections
A list or dict of xr.Dataset/xr.DataArray objects, where each item represents a single spatial point (scalar lat/lon coordinates with size 1).
Typical Sources:
- Output from batch_select() with return_data_and_metadata=False
- Lists returned by multi-point clipping with separated=True
- Custom collections of point time series
Example Input:
# List of 3 datasets, each a single point
data = [
ds_la, # shape (time: 8760, sim: 5)
ds_sf, # shape (time: 8760, sim: 5)
ds_sd # shape (time: 8760, sim: 5)
]
With separated=True + location_based_naming=True:
- Each dataset exported separately
- Filenames use lat/lon from dataset coordinates
With separated=True + location_based_naming=False:
With separated=False:
- Each dataset exports with base filename
- Duplicate filenames get _1, _2, _3 suffixes
Export Formats
| Format | Use Case | Compression | Local/S3 | Notes |
|---|---|---|---|---|
| NetCDF | Climate data standard, archival | netCDF4 / zlib | Local only | Default; preserves CF metadata |
| Zarr | Cloud / chunked I/O | Internal | Both (S3 requires mode="s3") |
Path: s3://bucket/...zarr |
| CSV | Tabular / spreadsheets | gzip (.csv.gz) |
Local only | Slow for large datasets |
The Export processor does not support GeoTIFF. For GIS raster output, use
rioxarraydirectly on the returnedxr.Datasetafter.get().
Format-Specific Notes
NetCDF (_export_to_netcdf)
- Attributes are sanitized via _clean_attrs_for_netcdf before writing.
- Best for archival, ncdump-readable.
Zarr (_export_to_zarr)
- mode="local" (default) writes to current working directory.
- mode="s3" writes to S3 — requires bucket configured outside the processor; only Zarr is allowed when mode="s3".
- Attributes also sanitized prior to write to keep parity with the NetCDF path.
CSV (_export_to_csv)
- gzip-compressed (.csv.gz).
- Practical only for small / point datasets.
Parameters
| Parameter | Type | Mode(s) | Description | Constraints |
|---|---|---|---|---|
filename |
str | All | Base filename (no extension) | Default: "dataexport" |
file_format |
str | All | Output format | "NetCDF", "Zarr", "CSV" (case-insensitive; aliases inferred) |
mode |
str | All | Storage destination | "local" or "s3"; "s3" requires file_format="Zarr" |
separated |
bool | 2, 3 | Export items to separate files | Ignored for gridded data without closest_cell |
location_based_naming |
bool | 2, 3 | Use lat/lon in filenames | Effective when items have scalar lat/lon |
filename_template |
str / None | All | Custom filename template | Optional |
export_method |
str | All | Return / handler routing | "data", "skip_existing", "raw", "calculate", "both", "none" |
raw_filename |
str / None | — | Custom name for raw export | Used by _handle_dict_result / _handle_selective_export |
calc_filename |
str / None | — | Custom name for calculated export | Used by _handle_dict_result / _handle_selective_export |
fail_on_error |
bool | All | Raise vs warn on write failure | Default: bool |
Code References
| Method | Lines | Purpose |
|---|---|---|
__init__ |
197–237 | Parse value dict, set defaults |
_validate_parameters |
238–325 | Validate format / mode / export_method / types |
execute |
326–393 | Route by export_method; returns input unchanged |
_export_data |
394–455 | match result dispatcher (Dataset/DataArray, dict, list/tuple) |
_export_collection |
457–511 | List/tuple handler honoring separated and location_based_naming |
_export_single_from_collection |
513–571 | Per-item write inside _export_collection |
_handle_dict_result |
573–588 | raw_data / calc_data dict (e.g., cava_data) |
_handle_selective_export |
590–618 | Selective raw/calculate/both for non-dict inputs |
_determine_data_type |
620–646 | Classify input as raw vs calculated |
_export_with_suffix |
648–674 | Append suffix (_raw, _calc) and write |
update_context |
676–696 | Record export metadata in context |
_clean_attrs_for_netcdf |
702–761 | Sanitize attrs for NetCDF/Zarr serialization |
_is_single_point_data |
763–785 | Detect scalar lat/lon |
_has_closest_cell_dimension |
787–818 | Multi-point clip detection |
_split_and_export_closest_cells |
820–902 | Split along closest_cell and write each slice |
_extract_point_coordinates |
904–964 | Pull (lat, lon) for filename suffix |
_generate_filename |
966–1030 | Build base filename (location_based or template) |
_get_unique_filename |
1032–1062 | Add _1, _2, ... when target exists |
export_single |
1064–1127 | match req_format and call format writer |
| .table_id("day") | ||
| .grid_label("d03") | ||
| .processes({ | ||
| "time_slice": ("2015-01-01", "2015-12-31"), | ||
| "clip": (37.77, -122.42), # Single point | ||
| "export": { | ||
| "filename": "sf_daily_max_temp", | ||
| "file_format": "CSV" | ||
| } | ||
| }) | ||
| .get()) |
Writes: sf_daily_max_temp.csv
Columns: time, t2max (one row per day)
## Implementation Details
### Format-Specific Behavior
**NetCDF**: Uses xarray `.to_netcdf()` with CF conventions and compression
```python
data.to_netcdf(filename, engine="netcdf4", encoding={var: {"zlib": True} for var in data.data_vars})
Zarr: Chunked cloud-optimized format (local or S3)
if mode == "s3":
data.to_zarr(f"s3://bucket/{filename}.zarr")
else:
data.to_zarr(f"./{filename}.zarr")
CSV: Flattens spatial dims; requires scalar or point data
Skip Existing
With export_method="skip_existing", processor checks if file exists before writing:
if os.path.exists(filename):
return None # Don't overwrite
else:
export_and_return_path(data, filename)
Location-Based Naming
Coordinates are formatted as compass directions:
# (37.7749, -122.4194) → "37-77N_122-42W"
lat_str = f"{abs(lat):.2f}{'NS'[lat < 0]}"
lon_str = f"{abs(lon):.2f}{'EW'[lon < 0]}"
filename = f"{base}_{lat_str}_{lon_str}.{ext}"
Common Patterns
Multi-Format Export
data = (ClimateData()
.catalog("cadcat")
.activity_id("WRF")
.variable("t2max")
.table_id("day")
.grid_label("d03")
.processes({
"time_slice": ("2015-01-01", "2015-12-31"),
"clip": "Alameda"
})
.get())
# Export to multiple formats
data.to_netcdf("alameda_2015.nc")
data.to_zarr("alameda_2015.zarr")
data.to_csv("alameda_2015.csv")
Batch Export Loop
counties = ["Alameda", "Contra Costa", "Santa Clara"]
for county in counties:
(ClimateData()
.catalog("cadcat")
.activity_id("WRF")
.variable("t2max")
.table_id("day")
.grid_label("d03")
.processes({
"clip": county,
"export": {
"filename": f"{county.lower()}_2015",
"file_format": "NetCDF"
}
})
.get())
See Also
- Processor Index
- How-To Guides → Export Data
- Architecture → Data Export
- xarray export docs: .to_netcdf()