Skip to content

Legacy Data Export

The climakitae.core.data_export module provides the legacy multi-format export helpers used to persist retrieved data to disk or cloud storage.

Warning

This page documents a legacy support module. It is kept for backward compatibility. New code should prefer the export processor in the modern interface.

What this module does

The legacy export layer writes an in-memory or lazily loaded xarray object to one of several formats:

Format Typical use
NetCDF Standard climate-data format with full metadata (CF conventions).
CSV Tabular export for time series or single-location data.
Zarr Cloud-optimized chunked storage for large datasets.
GeoTIFF Geographic raster for a single time slice, for use in GIS tools.

Public API

The single public entry point is export(). TMY/EPW export is reached through export() rather than a separate function.

Save xarray data as NetCDF, Zarr, or CSV in the current working directory, or if Zarr optionally stream the export file to an AWS S3 scratch bucket and give download URL. NetCDF can only be written to the HUB user partition if it will fit. Zarr can either be written to the HUB user partition or to S3 scratch bucket using the mode option.

Parameters:

Name Type Description Default
data DataArray | Dataset

Data to export, as output by e.g. DataParameters.retrieve().

required
filename str

Output file name (without file extension, i.e. "my_filename" instead of "my_filename.nc"). The default is "dataexport".

'dataexport'
format str

File format ("Zarr", "NetCDF", "CSV"). The default is "NetCDF".

'NetCDF'
mode str

Save location logic for Zarr file ("local", "s3"). The default is "local"

'local'

Returns:

Type Description
None
Source code in climakitae/core/data_export.py
def export(
    data: xr.DataArray | xr.Dataset,
    filename: str = "dataexport",
    format: str = "NetCDF",
    mode: str = "local",
):
    """Save xarray data as NetCDF, Zarr, or CSV in the current working directory, or if Zarr optionally
    stream the export file to an AWS S3 scratch bucket and give download URL. NetCDF can only be written
    to the HUB user partition if it will fit. Zarr can either be written to the HUB user partition or to
    S3 scratch bucket using the mode option.

    Parameters
    ----------
    data : xr.DataArray | xr.Dataset
        Data to export, as output by e.g. `DataParameters.retrieve()`.
    filename : str, optional
        Output file name (without file extension, i.e. "my_filename" instead
        of "my_filename.nc"). The default is "dataexport".
    format : str, optional
        File format ("Zarr", "NetCDF", "CSV"). The default is "NetCDF".
    mode : str, optional
        Save location logic for Zarr file ("local", "s3"). The default is "local"

    Returns
    -------
    None

    """
    ftype = type(data)

    if ftype not in [xr.core.dataset.Dataset, xr.core.dataarray.DataArray]:
        raise Exception(
            "Cannot export object of type "
            + str(ftype).strip("<class >")
            + ". Please pass an Xarray Dataset or DataArray."
        )

    if type(filename) is not str:
        raise Exception(
            (
                "Please pass a string"
                " (any characters surrounded by quotation marks)"
                " for your file name."
            )
        )
    filename = filename.split(".")[0]

    req_format = format.lower()

    if req_format not in ["zarr", "netcdf", "csv"]:
        raise Exception('Please select "Zarr", "NetCDF" or "CSV" as the file format.')

    extension_dict = {"zarr": ".zarr", "netcdf": ".nc", "csv": ".csv.gz"}

    save_name = filename + extension_dict[req_format]

    if (mode == "s3") and (req_format != "zarr"):
        raise Exception('To export to AWS S3 you must use the format="Zarr" option.')

    # now here is where exporting actually begins
    # we will have different functions for each file type
    # to keep things clean-ish
    match req_format:
        case "zarr":
            _export_to_zarr(data, save_name, mode)
        case "netcdf":
            _export_to_netcdf(data, save_name)
        case "csv":
            _export_to_csv(data, save_name)
        case _:
            raise Exception(
                'Please select "Zarr", "NetCDF" or "CSV" as the file format.'
            )