Legacy Data Interface

The climakitae.core.data_interface module is the main compatibility layer for the original climakitae API. It exposes the legacy parameter object, the data retrieval entry points, and the discovery helpers that powered the old GUI.

Warning

This page documents the legacy climakitae.core.data_interface module. It is kept for backward compatibility. New code should use climakitae.new_core.user_interface.ClimateData.

What this module does

The legacy data interface is responsible for:

mapping human-readable GUI values to catalog values
validating combinations of resolution, timescale, scenario, and spatial subset
exposing data and subsetting option lookup helpers
loading the cached data catalogs, variable metadata, station metadata, and boundary catalogs used by DataParameters

Core concepts

Concept	Legacy symbol	Role
Variable metadata	`VariableDescriptions`	Loads `variable_descriptions.csv` once and keeps it available for option lookups
Shared data connections	`DataInterface`	Singleton cache for catalogs, stations, boundaries, and warming-level tables
Query state	`DataParameters`	Param-based configuration object used by the GUI and direct code paths
Data discovery	`get_data_options()`	Returns the valid query combinations in legacy GUI language
Spatial discovery	`get_subsetting_options()`	Returns valid boundaries and station geometry options
Data retrieval	`get_data()` / `DataParameters.retrieve()`	Executes a legacy query and returns lazily loaded xarray data

Query flow

DataParameters loads the singleton DataInterface and populates the available options.
Option observers keep fields like resolution, timescale, scenario_ssp, and cached_area in sync.
retrieve() or get_data() calls the catalog loader.
The loader returns an xarray.DataArray, xarray.Dataset, or a list of DataArray objects depending on the request.

Legacy field names

The legacy module uses GUI-style names instead of catalog-native names. Common examples:

Legacy field	Meaning	Modern equivalent
`downscaling_method`	Dynamical, Statistical, or both	`activity_id`
`resolution`	3 km, 9 km, or 45 km	`grid_label`
`timescale`	hourly, daily, monthly	`table_id`
`scenario_ssp` / `scenario_historical`	Scenario selection buckets	`experiment_id`
`area_subset` / `cached_area`	Named boundary selection	`clip` processor
`time_slice`	Year range tuple	`time_slice` processor

See Core Concepts for the full mapping.

Examples

Direct query with `DataParameters`

from climakitae.core.data_interface import DataParameters

params = DataParameters()
params.variable = "Air Temperature at 2m"
params.downscaling_method = "Dynamical"
params.resolution = "9 km"
params.timescale = "hourly"
params.scenario_historical = ["Historical Climate"]
params.scenario_ssp = ["SSP 3-7.0"]
params.area_subset = "CA counties"
params.cached_area = ["Los Angeles County"]

data = params.retrieve()

Direct query with `get_data`

from climakitae.core.data_interface import get_data

data = get_data(
    variable="Air Temperature at 2m",
    resolution="9 km",
    timescale="hourly",
    downscaling_method="Dynamical",
    scenario=["Historical Climate", "SSP 3-7.0"],
    area_subset="CA counties",
    cached_area=["Los Angeles County"],
)

Public API

VariableDescriptions

Load Variable Desciptions CSV only once

This is a singleton class that needs to be called separately from DataInterface because variable descriptions are used without DataInterface in ck.view. Also ck.view is loaded on package load so this avoids loading boundary data when not needed.

Attributes:

Name	Type	Description
`variable_descriptions`	`DataFrame`	pandas dataframe that stores available data variables usable with the package

Source code in climakitae/core/data_interface.py

def __init__(self):
    self.variable_descriptions = pd.DataFrame

`load()`

Read the variable descriptions csv into class variable.

Source code in climakitae/core/data_interface.py

def load(self):
    """Read the variable descriptions csv into class variable."""
    if self.variable_descriptions.empty:
        self.variable_descriptions = read_csv_file(VARIABLE_DESCRIPTIONS_CSV_PATH)

DataInterface

Load data connections into memory once

This is a singleton class called by the various Param classes to connect to the local data and to the intake data catalog and parquet boundary catalog. The class attributes are read only so that the data does not get changed accidentially.

Attributes:

Name	Type	Description
`variable_descriptions`	`DataFrame`	variable descriptions pandas data frame
`stations`	`DataFrame`	station locations pandas data frame
`stations_gdf`	`GeoDataFrame`	station locations geopandas data frame
`data_catalog`	`ESMDataSource`	intake ESM data catalog
`boundary_catalog`	`Catalog`	parquet boundary catalog
`geographies`	`Boundaries`	boundary dictionaries class
`warming_level_times`	`DataFrame`	table of when each simulation/scenario reaches each warming level

Source code in climakitae/core/data_interface.py

def __init__(self):
    global _data_interface_initialized

    if _data_interface_initialized:
        return

    with _data_interface_init_lock:
        if _data_interface_initialized:
            return
        var_desc = VariableDescriptions()
        var_desc.load()
        self._variable_descriptions = var_desc.variable_descriptions
        self._stations = pd.read_csv(HADISD_STATIONS_URL)
        self._stations_gdf = gpd.GeoDataFrame(
            self.stations,
            crs="EPSG:4326",
            geometry=gpd.points_from_xy(self.stations.LON_X, self.stations.LAT_Y),
        )
        self._data_catalog = intake.open_esm_datastore(DATA_CATALOG_URL)
        self._warming_level_times = read_csv_file(
            GWL_1850_1900_FILE, index_col=[0, 1, 2]
        )

        # Get geography boundaries
        self._boundary_catalog = intake.open_catalog(BOUNDARY_CATALOG_URL)
        self._geographies = Boundaries(self.boundary_catalog)

        self._geographies.load()
        _data_interface_initialized = True

`variable_descriptions` `property`

Get the variable descriptions dataframe

`stations` `property`

Get the stations dataframe

`stations_gdf` `property`

Get the stations geopandas dataframe

`data_catalog` `property`

Get the data catalog

`warming_level_times` `property`

Get the warming level times dataframe

`boundary_catalog` `property`

Get the boundary catalog

`geographies` `property`

Get the geographies object

DataParameters

Bases: Parameterized

Python param object to hold data parameters for use in panel GUI.

Call DataParameters when you want to select and retrieve data from the climakitae data catalog without using the ckg.Select GUI. ckg.Select uses this class to store selections and retrieve data.

DataParameters calls DataInterface, a singleton class that makes the connection to the intake-esm data store in S3 bucket.

Attributes:

Name	Type	Description
`unit_options_dict`	`dict`	options dictionary for converting unit to other units
`area_subset`	`str`	dataset to use from Boundaries for sub area selection
`cached_area`	`list of strs`	one or more features from area_subset datasets to use for selection
`latitude`	`tuple`	latitude range of selection box
`longitude`	`tuple`	longitude range of selection box
`variable_type`	`str`	toggle raw or derived variable selection
`default_variable`	`str`	initial variable to have selected in widget
`time_slice`	`tuple`	year range to select
`resolution`	`str`	resolution of data to select ("3 km", "9 km", "45 km")
`timescale`	`str`	frequency of dataset ("hourly", "daily", "monthly")
`scenario_historical`	`list of strs`	historical scenario selections
`area_average`	`str`	whether to comput area average ("Yes", "No")
`downscaling_method`	`str`	whether to choose WRF or LOCA2 data or both ("Dynamical", "Statistical", "Dynamical+Statistical")
`data_type`	`str`	whether to choose gridded or station based data ("Gridded", "Stations")
`stations`	`list or strs`	list of stations that can be filtered by cached_area
`_station_data_info`	`str`	informational statement when station data selected with data_type
`scenario_ssp`	`list of strs`	list of future climate scenarios selected (availability depends on other params)
`simulation`	`list of strs`	list of simulations (models) selected (availability depends on other params)
`variable`	`str`	variable long display name
`units`	`str`	unit abbreviation currently of the data (native or converted)
`enable_hidden_vars`	`boolean`	enable selection of variables that are hidden from the GUI?
`extended_description`	`str`	extended description of the data variable
`variable_id`	`list of strs`	list of variable ids that match the variable (WRF and LOCA2 can have different codes for same type of variable)
`historical_climate_range_wrf`	`tuple`	time range of historical WRF data
`historical_climate_range_loca`	`tuple`	time range of historical LOCA2 data
`historical_climate_range_wrf_and_loca`	`tuple`	time range of historical WRF and LOCA2 data combined
`historical_reconstruction_range`	`tuple`	time range of historical reanalysis data
`ssp_range`	`tuple`	time range of future scenario SSP data
`_info_about_station_data`	`str`	warning message about station data
`_data_warning`	`str`	warning about selecting unavailable data combination
`data_interface`	`DataInterface`	data connection singleton class that provides data
`_data_catalog`	`ESMDataSource`	shorthand alias to DataInterface.data_catalog
`_variable_descriptions`	`DataFrame`	shorthand alias to DataInterface.variable_descriptions
`_stations_gdf`	`GeoDataFrame`	shorthand alias to DataInterface.stations_gdf
`_geographies`	`Boundaries`	shorthand alias to DataInterface.geographies
`_geography_choose`	`dict`	shorthand alias to Boundaries.boundary_dict()
`_warming_level_times`	`DataFrame`	shorthand alias to DataInterface.warming_level_times
`colormap`	`str`	default colormap to render the currently selected data
`scenario_options`	`list of strs`	list of available scenarios (historical and ssp) for selection
`variable_options_df`	`DataFrame`	filtered variable descriptions for the downscaling_method and timescale
`warming_level`	`array`	global warming level(s)
`warming_level_window`	`integer`	years around Global Warming Level (+/-) (e.g. 15 means a 30yr window)
`approach`	`(str, 'Warming Level' or Time)`	how do you want the data to be retrieved?
`warming_level_months`	`array`	months of year to use for computing warming levels default to entire calendar year: 1,2,3,4,5,6,7,8,9,10,11,12
`all_touched`	`boolean`	spatial subset option for within or touching selection

Source code in climakitae/core/data_interface.py

def __init__(self, **params):
    # Set default values
    super().__init__(**params)

    self.data_interface = DataInterface()

    # Data Catalog
    self._data_catalog = self.data_interface.data_catalog

    # Warming Levels Table
    self._warming_level_times = self.data_interface.warming_level_times

    # variable descriptions
    self._variable_descriptions = self.data_interface.variable_descriptions

    # station data
    self._stations_gdf = self.data_interface.stations_gdf

    # Get geography boundaries and selection options
    self._geographies = self.data_interface.geographies
    self._geography_choose = self._geographies.boundary_dict()

    # Set location params
    self.area_subset = "none"
    self.param["area_subset"].objects = list(self._geography_choose.keys())
    self.param["cached_area"].objects = list(
        self._geography_choose[self.area_subset].keys()
    )

    self.all_touched = False

    # Set data params
    (
        self.scenario_options,
        self.simulation,
        unique_variable_ids,
    ) = _get_user_options(
        data_catalog=self._data_catalog,
        downscaling_method=self.downscaling_method,
        timescale=self.timescale,
        resolution=self.resolution,
    )
    self.variable_options_df = _get_variable_options_df(
        variable_descriptions=self._variable_descriptions,
        unique_variable_ids=unique_variable_ids,
        downscaling_method=self.downscaling_method,
        timescale=self.timescale,
        enable_hidden_vars=self.enable_hidden_vars,
    )

    # Show derived index option?
    indices = True
    if self.data_type == "Stations":
        indices = False
    if self.downscaling_method != "Dynamical":
        indices = False
    if self.timescale == "monthly":
        indices = False
    if not indices:
        self.param["variable_type"].objects = ["Variable"]
        self.variable_type = "Variable"
    else:
        self.param["variable_type"].objects = ["Variable", "Derived Index"]

    # Set scenario param
    scenario_ssp_options = [
        scenario_to_experiment_id(scen, reverse=True)
        for scen in self.scenario_options
        if "ssp" in scen
    ]
    for scenario_i in SSPS:
        if scenario_i in scenario_ssp_options:  # Reorder list
            scenario_ssp_options.remove(scenario_i)  # Remove item
            scenario_ssp_options.append(scenario_i)  # Add to back of list
    self.param["scenario_ssp"].objects = scenario_ssp_options
    self.scenario_ssp = []

    # Set variable param
    self.param["variable"].objects = (
        self.variable_options_df.display_name.values.tolist()
    )
    self.variable = self.default_variable

    # Set colormap, units, & extended description
    var_info = self.variable_options_df[
        self.variable_options_df["display_name"] == self.variable
    ]

    # Set params that are not selected by the user
    self.colormap = var_info.colormap.item()
    self.units = var_info.unit.item()
    self.extended_description = var_info.extended_description.item()
    self.variable_id = _get_var_ids(
        self._variable_descriptions,
        self.variable,
        self.downscaling_method,
        self.timescale,
        self.enable_hidden_vars,
    )
    self._data_warning = ""

`retrieve(config=None, merge=True)`

Retrieve data from catalog

By default, DataParameters determines the data retrieved. Grabs the data from the AWS S3 bucket, returns lazily loaded dask array. User-facing function that provides a wrapper for read_catalog_from_select.

Returns:

Name	Type	Description
`data_return`	`DataArray \| Dataset \| List[DataArray]`	DataArray or Dataset object

Source code in climakitae/core/data_interface.py

def retrieve(
    self, config: str = None, merge: bool = True
) -> Union[xr.DataArray, xr.Dataset, List[xr.DataArray]]:
    """Retrieve data from catalog

    By default, DataParameters determines the data retrieved.
    Grabs the data from the AWS S3 bucket, returns lazily loaded dask array.
    User-facing function that provides a wrapper for read_catalog_from_select.

    Returns
    -------
    data_return : xr.DataArray | xr.Dataset | List[xr.DataArray]
        DataArray or Dataset object

    """

    def _warn_of_large_file_size(da: xr.DataArray):
        """Warn user if the data array is large"""
        nbytes = da.nbytes
        match nbytes:
            case nbytes if nbytes >= int(1e9) and nbytes < int(5e9):
                print(
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                    "! Returned data array is large. Operations could take up to 5x longer than 1GB of data!\n"
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                )
            case nbytes if nbytes >= int(5e9) and nbytes < int(1e10):
                print(
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                    "!! Returned data array is very large. Operations could take up to 8x longer than 1GB of data !!\n"
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                )
            case nbytes if nbytes >= int(1e10):
                print(
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                    "!!! Returned data array is huge. Operations could take 10x to infinity longer than 1GB of data !!!\n"
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                )

    def _warn_of_empty_data(self):
        if self.approach == "Warming Level" and (len(self.warming_level) > 1):
            print(
                "WARNING FOR WARMING LEVELS APPROACH\n-----------------------------------\nThere may be NaNs in your data for certain simulation/warming level combinations if the warming level is not reached for that particular simulation before the year 2100. \n\nThis does not mean you have missing data, but rather a feature of how the data is combined in retrieval to return a single data object. \n\nIf you want to remove these empty simulations, it is recommended to first subset the data object by each individual warming level and then dropping NaN values."
            )
        elif (self.approach == "Time") and (len(self.scenario_ssp) > 1):
            print(
                "WARNING\n-------\nYou have retrieved data for more than one SSP, but not all ensemble members for each GCM are available for all SSPs.\n\nAs a result, some scenario and simulation combinations may contain NaN values.\n\nIf you want to remove these empty simulations, it is recommended to first subset the data object by each individual scenario and then dropping NaN values."
            )

    data_return = read_catalog_from_select(self)

    if isinstance(data_return, list):
        for l in data_return:
            _warn_of_large_file_size(l)
    else:
        _warn_of_large_file_size(data_return)

    # Warn about empty simulations for certain selections
    _warn_of_empty_data(self)

    return data_return

get_data_options

Get data options, in the same format as the Select GUI, given a set of possible inputs. Allows the user to access the data using the same language as the GUI, bypassing the sometimes unintuitive naming in the catalog. If no function inputs are provided, the function returns the entire AE catalog that is available via the Select GUI

Parameters:

Name	Type	Description	Default
`variable`	`str`	Default to None	`None`
`downscaling_method`	`str`	Default to None	`None`
`resolution`	`str`	Default to None	`None`
`timescale`	`str`	Default to None	`None`
`scenario`	`str or list`	Default to None	`None`
`tidy`	`boolean`	Format the pandas dataframe? This creates a DataFrame with a MultiIndex that makes it easier to parse the options. Default to True	`True`
`enable_hidden_vars`	`boolean`	Return all variables, including the ones in which "show" is set to False? Default to False	`False`

Returns:

Name	Type	Description
`cat_subset`	`DataFrame`	Catalog options for user-provided inputs

Source code in climakitae/core/data_interface.py

def get_data_options(
    variable: str = None,
    downscaling_method: str = None,
    resolution: str = None,
    timescale: str = None,
    scenario: Union[str, list[str]] = None,
    tidy: bool = True,
    enable_hidden_vars: bool = False,
) -> pd.DataFrame:
    """Get data options, in the same format as the Select GUI, given a set of possible inputs.
    Allows the user to access the data using the same language as the GUI, bypassing the sometimes unintuitive naming in the catalog.
    If no function inputs are provided, the function returns the entire AE catalog that is available via the Select GUI

    Parameters
    ----------
    variable : str, optional
        Default to None
    downscaling_method : str, optional
        Default to None
    resolution : str, optional
        Default to None
    timescale : str, optional
        Default to None
    scenario : str or list, optional
        Default to None
    tidy : boolean, optional
        Format the pandas dataframe? This creates a DataFrame with a MultiIndex that makes it easier to parse the options.
        Default to True
    enable_hidden_vars : boolean, optional
        Return all variables, including the ones in which "show" is set to False?
        Default to False

    Returns
    -------
    cat_subset : pd.DataFrame
        Catalog options for user-provided inputs

    """
    # Get intake catalog and variable descriptions from DataInterface object
    data_interface = DataInterface()
    var_df = data_interface.variable_descriptions
    catalog = data_interface.data_catalog
    cat_df = _get_user_friendly_catalog(
        intake_catalog=catalog,
        variable_descriptions=var_df,
        enable_hidden_vars=enable_hidden_vars,
    )

    # Raise error for bad input from user
    for user_input in [variable, downscaling_method, resolution, timescale]:
        if (user_input is not None) and (type(user_input) != str):
            print(
                _format_error_print_message(
                    "Function arguments require a single string value for your inputs"
                )
            )
            return None

    def _list(x: Union[str, list]) -> list:
        """Convert x to a list if its not a list"""
        return x if isinstance(x, list) else [x]

    d = {
        "variable": _list(variable),
        "timescale": _list(timescale),
        "downscaling_method": _list(downscaling_method),
        "scenario": _list(scenario),
        "resolution": _list(resolution),
    }

    d = _check_if_good_input(d, cat_df)

    # Subset the catalog with the user's inputs
    cat_subset = cat_df[
        (cat_df["variable"].isin(d["variable"]))
        & (cat_df["downscaling_method"].isin(d["downscaling_method"]))
        & (cat_df["resolution"].isin(d["resolution"]))
        & (cat_df["timescale"].isin(d["timescale"]))
        & (cat_df["scenario"].isin(d["scenario"]))
    ].reset_index(drop=True)
    if len(cat_subset) == 0:
        print(
            _format_error_print_message(
                "No data found for your input values. Please modify your data request."
            )
        )
        return None

    if tidy:
        cat_subset = cat_subset.set_index(
            ["downscaling_method", "scenario", "timescale"]
        )
    return cat_subset

get_subsetting_options

Get all geometry options for spatial subsetting. Options match those in selections GUI

Parameters:

Name	Type	Description	Default
`area_subset`	`str`	One of "all", "states", "CA counties", "CA Electricity Demand Forecast Zones", "CA watersheds", "CA Electric Balancing Authority Areas", "CA Electric Load Serving Entities (IOU & POU)", "Stations" Defaults to "all", which shows all the geometry options with area_subset as a multiindex	`'all'`

Returns:

Name	Type	Description
`geom_df`	`DataFrame`	Geometry options Shows only options for one area_subset if input is provided that is not "all" i.e. if area_subset = "states", only the options for states will be returned

Source code in climakitae/core/data_interface.py

def get_subsetting_options(area_subset: str = "all") -> pd.DataFrame:
    """Get all geometry options for spatial subsetting.
    Options match those in selections GUI

    Parameters
    ----------
    area_subset : str
        One of "all", "states", "CA counties", "CA Electricity Demand Forecast Zones", "CA watersheds", "CA Electric Balancing Authority Areas", "CA Electric Load Serving Entities (IOU & POU)", "Stations"
        Defaults to "all", which shows all the geometry options with area_subset as a multiindex

    Returns
    -------
    geom_df : pd.DataFrame
        Geometry options
        Shows only options for one area_subset if input is provided that is not "all"
        i.e. if area_subset = "states", only the options for states will be returned

    """
    # Get geographies from DataInterface object
    data_interface = DataInterface()
    geographies = data_interface._geographies
    boundary_dict = geographies.boundary_dict()

    # Get geometries and labels from Boundaries object
    df_dict = {
        "states": geographies._us_states[["abbrevs", "geometry"]].rename(
            columns={"abbrevs": "NAME"}
        ),
        "CA counties": geographies._ca_counties[["NAME", "geometry"]],
        "CA Electricity Demand Forecast Zones": geographies._ca_forecast_zones.rename(
            columns={"FZ_Name": "NAME"}
        )[["NAME", "geometry"]],
        "CA watersheds": geographies._ca_watersheds.rename(columns={"Name": "NAME"})[
            ["NAME", "geometry"]
        ],
        "CA Electric Balancing Authority Areas": geographies._ca_electric_balancing_areas[
            ["NAME", "geometry"]
        ],
        "CA Electric Load Serving Entities (IOU & POU)": geographies._ca_utilities.rename(
            columns={"Utility": "NAME"}
        )[
            ["NAME", "geometry"]
        ],
        "Stations": data_interface._stations_gdf.sort_values("station").rename(
            columns={"station": "NAME"}
        )[["NAME", "geometry"]],
    }

    # Confirm that input for argument "area_subset" is valid
    # Raise error and print helpful statements if bad input
    valid_inputs = list(df_dict.keys()) + ["all"]
    if area_subset not in valid_inputs:
        print(
            "'"
            + str(area_subset)
            + "' is not a valid option for function argument 'area_subset'.\nChoose one of the following: "
            + ", ".join(valid_inputs)
        )
        print("Default argument 'all' will show all valid geometry options.")
        raise ValueError("Bad input for argument 'area_subset'")

    # Some of the geometry options are limited further by the selections.show() GUI
    # i.e. not all US states are an option in the GUI, even though the parquet file provided by geographies._us_states contains all US states
    # Here, we limit the output to return the same options as the GUI
    for name, df in df_dict.items():
        df["area_subset"] = [name] * len(
            df
        )  # Add area subset as a column. Used to create multiindex if area_subset = "all"
        if name == "Stations":  # This logic doesn't apply to weather stations
            pass  # do nothing
        else:  # Limit options
            df = df[df["NAME"].isin(list(boundary_dict[name].keys()))]
        df_dict[name] = df  # Replace the dictionary with the new, reduced dictionary

    if area_subset != "all":
        # Only return the desired area subset
        geoms_df = (
            df_dict[area_subset]
            .drop(columns="area_subset")
            .rename(columns={"NAME": "cached_area"})
            .set_index("cached_area")
        )
    else:
        geoms_df = pd.concat(list(df_dict.values())).rename(
            columns={"NAME": "cached_area"}
        )
        geoms_df = geoms_df.set_index(
            ["area_subset", "cached_area"]
        )  # Create multiindex

    return geoms_df

get_data

Retrieve formatted data from the Analytics Engine data catalog.

Contrasts with DataParameters().retrieve(), which retrieves data from the user inputs in climakitaegui's selections GUI.

Parameters:

Name	Type	Description	Default
`variable`	`str`	String name of climate variable	required
`resolution`	`str, one of ["3 km", "9 km", "45 km"]`	Resolution of data in kilometers	required
`timescale`	`str, one of ["hourly", "daily", "monthly"]`	Temporal frequency of dataset	required
`downscaling_method`	`str, one of ["Dynamical", "Statistical", "Dynamical+Statistical"]`	Downscaling method of the data: WRF ("Dynamical"), LOCA2 ("Statistical"), or both "Dynamical+Statistical" Default to "Dynamical"	`'Dynamical'`
`data_type`	`str, one of ["Gridded", "Stations"]`	Whether to choose gridded data or weather station data Default to "Gridded"	`'Gridded'`
`approach`	`one of ["Time", "Warming Level"]`	Default to "Time"	`'Time'`
`scenario`	`str or list of str`	SSP scenario ["SSP 3-7.0", "SSP 2-4.5","SSP 5-8.5"] and/or historical data selection ["Historical Climate", "Historical Reconstruction"] If approach = "Time", you need to set a valid option If approach = "Warming Level", scenario is ignored	`None`
`units`	`str`	Variable units. Defaults to native units of data	`None`
`area_subset`	`str`	Area category: i.e "CA counties" Defaults to entire domain ("none")	`'none'`
`cached_area`	`list`	Area: i.e "Alameda county" Defaults to entire domain (["entire domain"])	`None`
`area_average`	`one of ["Yes","No"]`	Take an average over spatial domain? Default to "No".	`None`
`latitude`	`None or tuple of float`	Tuple of valid latitude bounds Default to entire domain	`None`
`longitude`	`None or tuple of float`	Tuple of valid longitude bounds Default to entire domain	`None`
`time_slice`	`tuple`	Time range for retrieved data Only valid for approach = "Time"	`None`
`stations`	`list of str`	Which weather stations to retrieve data for Only valid for data_type = "Stations" Default to all stations	`None`
`warming_level`	`list of float`	Must be one of the warming levels available in `clmakitae.core.constants` Only valid for approach = "Warming Level" and data_type = "Stations"	`None`
`warming_level_window`	`int in range(5, 25)`	Years around Global Warming Level (+/-) (e.g. 15 means a 30yr window)	`None`
`warming_level_months`	`list of int`	Months of year for which to perform warming level computation Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12] For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season. Only valid for approach = "Warming Level" and data_type = "Stations"	`None`
`all_touched`	`boolean`	spatial subset option for within or touching selection	`False`
`enable_hidden_vars`	`boolean`	Return all variables, including the ones in which "show" is set to False? Default to False	`False`
`kwargs`	`dict`	Additional keyword arguments to pass to DataParameters()	`{}`

Returns:

Type	Description
`DataArray`	The requested climate data, or None if an error occurred.

Notes

Errors aren't raised by the function. Rather, an appropriate informative message is printed, and the function returns None. This is due to the fact that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad inputs; instead, that error is ignored and a more informative error message is printed instead.

Source code in climakitae/core/data_interface.py

def get_data(
    variable: str,
    resolution: str,
    timescale: str,
    downscaling_method: str = "Dynamical",
    data_type: str = "Gridded",
    approach: str = "Time",
    scenario: Union[str, list[str]] = None,
    units: str = None,
    warming_level: list[float] = None,
    area_subset: str = "none",
    latitude: tuple[float, float] = None,
    longitude: tuple[float, float] = None,
    cached_area: list[str] = None,
    area_average: str = None,
    time_slice: tuple = None,
    stations: list[str] = None,
    warming_level_window: int = None,
    warming_level_months: list[int] = None,
    all_touched=False,
    enable_hidden_vars: bool = False,
    **kwargs,
) -> xr.DataArray:
    """Retrieve formatted data from the Analytics Engine data catalog.

    Contrasts with DataParameters().retrieve(), which retrieves data from
    the user inputs in climakitaegui's selections GUI.

    Parameters
    ----------
    variable : str
        String name of climate variable
    resolution : str, one of ["3 km", "9 km", "45 km"]
        Resolution of data in kilometers
    timescale : str, one of ["hourly", "daily", "monthly"]
        Temporal frequency of dataset
    downscaling_method : str, one of ["Dynamical", "Statistical", "Dynamical+Statistical"], optional
        Downscaling method of the data:
        WRF ("Dynamical"), LOCA2 ("Statistical"), or both "Dynamical+Statistical"
        Default to "Dynamical"
    data_type : str, one of ["Gridded", "Stations"], optional
        Whether to choose gridded data or weather station data
        Default to "Gridded"
    approach : one of ["Time", "Warming Level"], optional
        Default to "Time"
    scenario : str or list of str, optional
        SSP scenario ["SSP 3-7.0", "SSP 2-4.5","SSP 5-8.5"] and/or historical data selection ["Historical Climate", "Historical Reconstruction"]
        If approach = "Time", you need to set a valid option
        If approach = "Warming Level", scenario is ignored
    units : str, optional
        Variable units.
        Defaults to native units of data
    area_subset : str, optional
        Area category: i.e "CA counties"
        Defaults to entire domain ("none")
    cached_area : list, optional
        Area: i.e "Alameda county"
        Defaults to entire domain (["entire domain"])
    area_average : one of ["Yes","No"], optional
        Take an average over spatial domain?
        Default to "No".
    latitude : None or tuple of float, optional
        Tuple of valid latitude bounds
        Default to entire domain
    longitude : None or tuple of float, optional
        Tuple of valid longitude bounds
        Default to entire domain
    time_slice : tuple, optional
        Time range for retrieved data
        Only valid for approach = "Time"
    stations : list of str, optional
        Which weather stations to retrieve data for
        Only valid for data_type = "Stations"
        Default to all stations
    warming_level : list of float, optional
        Must be one of the warming levels available in `clmakitae.core.constants`
        Only valid for approach = "Warming Level" and data_type = "Stations"
    warming_level_window : int in range (5,25), optional
        Years around Global Warming Level (+/-) (e.g. 15 means a 30yr window)
    warming_level_months : list of int, optional
        Months of year for which to perform warming level computation
        Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12]
        For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season.
        Only valid for approach = "Warming Level" and data_type = "Stations"
    all_touched : boolean
        spatial subset option for within or touching selection
    enable_hidden_vars : boolean, optional
        Return all variables, including the ones in which "show" is set to False?
        Default to False
    kwargs : dict
        Additional keyword arguments to pass to DataParameters()

    Returns
    -------
    xr.DataArray
        The requested climate data, or None if an error occurred.

    Notes
    -----
    Errors aren't raised by the function. Rather, an appropriate informative
    message is printed, and the function returns None. This is due to the fact
    that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad
    inputs; instead, that error is ignored and a more informative error message
    is printed instead.

    """

    def _check_valid_input_station(
        stations: list[str], station_options_all: list[str]
    ) -> list[str]:
        """Check that the user input a valid value for station
        If invalid input, the function will "guess" a close-ish station using difflib
        See _get_closest_option function for more info
        If invalid input and no guesses found, the function will print an informative
        error message and raise a ValueError

        Parameters
        ----------
        stations : list[str]
        station_options_all : list of string
            All the possible station options
            Can be retrieved from DataParameters()._stations_gdf.station.values

        Returns
        -------
        stations : list[str]

        """
        station_options_all = sorted(
            station_options_all
        )  # sorted() puts the list in alphabetical order

        # Keep track of if error was raised and message was printed to user
        # If more than one station prints errors to the console, print a space between each station
        printed_warning = False

        for i, station_i in enumerate(stations):  # Go through all the stations
            # If the station is a valid option, don't do anything
            if station_i in station_options_all:
                continue

            if printed_warning:
                print(
                    "\n", end=""
                )  # Add a space between stations for better readability

            # If the station isn't a valid option...
            print("Input station='" + station_i + "' is not a valid option.")
            closest_options = _get_closest_options(
                station_i, station_options_all
            )  # See if theres any similar options

            # Sad! No closest options found. Just set the key to all valid options
            match closest_options:
                case None:
                    print("Valid options: \n- ", end="")
                    print("\n- ".join(station_options_all))
                    raise ValueError("Bad input")

                # Just one option in the list
                case closest_options if len(closest_options) == 1:
                    print("Closest option: '" + closest_options[0] + "'")

                case closest_options if len(closest_options) > 1:
                    print("Closest options: \n- " + "\n- ".join(closest_options))

            print("Outputting data for station='" + closest_options[0] + "'")
            stations[i] = closest_options[
                0
            ]  # Replace that value in the list with the best option :)

            printed_warning = True

        return stations

    # Internal functions
    def _error_handling_warming_level_inputs(
        wl: Union[list[float], list[int]],
        argument_name: str,
        downscaling_method: str,
        resolution: str,
    ):
        """Error handling for arguments: warming_level and warming_level_month
        Both require a list of either floats or ints
        argument_name is either "warming_level" or "warming_level_months" and is used to
        print an appropriate error message for bad input

        """
        # Find the WL bounds for LOCA and WRF
        loca, wrf = create_ae_warming_trajectories(resolution)
        loca_max = round(loca.max().max(), 2)
        wrf_max = round(wrf.max().max(), 2)

        match downscaling_method:
            case "Statistical":
                max_val = loca_max
            case "Dynamical":
                max_val = wrf_max
            case "Dynamical+Statistical":
                max_val = min(loca_max, wrf_max)
            case _:
                raise ValueError(
                    "Downscaling method be 'Statistical', 'Dynamical', or 'Dynamical+Statistical'"
                )

        if (wl is not None) and not isinstance(wl, list):
            if isinstance(wl, (float, int)):  # Convert float to a singleton list
                wl = [wl]
            if not isinstance(wl, list):
                raise ValueError(
                    f"""Function argument {argument_name} requires a float/int or list 
                    of floats/ints input. Your input: {type(wl)}"""
                )
        if isinstance(wl, list):
            for x in wl:
                if not isinstance(x, (float, int)):
                    raise ValueError(
                        f"Each item in '{argument_name}' must be a float or int. Got: {type(x)}"
                    )
                if argument_name == "warming_level":
                    if x < 0 or x > max_val:
                        raise ValueError(
                            f"{argument_name} value {x}. "
                            f"Allowed range for {downscaling_method}-downscaled data at {resolution} resolution is 0 to {max_val:.2f}."
                        )
        return wl

    def _error_handling_approach_inputs(
        approach: str, scenario: str, warming_level: list[float], time_slice: tuple
    ) -> tuple[str, str, list[float], tuple]:
        """Error handling for approach and scenario inputs"""
        _valid_options_approach = ["Time", "Warming Level"]
        if approach not in _valid_options_approach:
            # Maybe the user just capitalized it wrong
            # If so, fix it for them-- don't raise an error
            if approach.lower().title() in _valid_options_approach:
                approach = approach.lower().title()
            else:
                # An error will be raised later when you try to set selections
                pass

        # Print a warming if scenario is set but approach is Warming Level
        if approach == "Warming Level" and scenario not in [None, ["n/a"], "n/a"]:
            print(
                'WARNING: "scenario" argument will be ignored for warming levels approach'
            )
            scenario = None
        if approach == "Warming Level" and time_slice != None:
            print(
                'WARNING: "time_slice" argument will be ignored for warming levels approach'
            )
            time_slice = None

        if approach == "Time":
            warming_level = ["n/a"]

        return approach, scenario, warming_level, time_slice

    def _error_handling_location_settings(
        area_subset: list[str], cached_area: list[str]
    ) -> list[str]:
        """Maybe the user put an input for cached area but not for area subset
        We need to have the matching/correct area subset in order for selections.retrieve() to actually subset the data
        Here, we load in the geometry options to set area_subset to the correct value
        This also raises an appropriate error if the user has a bad input

        """
        if area_subset == "none" and cached_area != ["entire domain"]:
            geom_df = get_subsetting_options(area_subset="all").reset_index()
            area_subset_vals = geom_df[geom_df["cached_area"] == cached_area[0]][
                "area_subset"
            ].values
            if len(area_subset_vals) == 0:
                raise ValueError("Invalid input for argument 'cached_area'")
            else:
                area_subset = area_subset_vals[0]
        return area_subset

    def _get_scenario_ssp_scenario_historical(
        approach: str, scenario: str
    ) -> tuple[str, str]:
        """Get scenario_ssp, scenario_historical depending on user inputs"""
        match approach:
            case "Warming Level":
                scenario_ssp = ["n/a"]
                scenario_historical = ["n/a"]
            case "Time":
                if (
                    "Historical Reconstruction" in scenario
                ):  # Handling for Historical Reconstruction option
                    scenario_historical = [x for x in scenario if "Historical" in x]
                    scenario_ssp = []
                    if (
                        len(scenario) != 1
                    ):  # No SSP options for Historical Reconstruction data
                        print(
                            "WARNING: Historical Reconstruction data cannot be retrieved in the same data object as SSP scenario options. SSP data will not be retrieved."
                        )
                else:
                    scenario_ssp = [
                        x for x in scenario if "Historical" not in x
                    ]  # Add non-historical SSPs to scenario_ssp key
                    if "Historical Climate" in scenario:
                        scenario_historical = ["Historical Climate"]
                    else:
                        scenario_historical = []
            case _:
                scenario_ssp, scenario_historical = None, None
        return scenario_ssp, scenario_historical

    # default values set as lists are dangerous, so set them to None and then set to
    # default value later
    if cached_area is None:
        cached_area = ["entire domain"]
    # Get intake catalog and variable descriptions from DataInterface object
    data_interface = DataInterface()
    var_df = data_interface.variable_descriptions.rename(
        columns={"variable": "display_name"}
    )  # Rename column so that it can be merged with cat_df

    # Filter variable descriptions based on enable_hidden_vars
    if not enable_hidden_vars:
        var_df = var_df[var_df["show"] == True]

    ## --------- ERROR HANDLING ----------
    # Deal with bad or missing users inputs

    # Station data error handling
    if data_type == "Stations":
        # dictionary with { argument name : [valid option, user input]}
        d = {
            "downscaling_method": ["Dynamical", downscaling_method],
            "timescale": ["hourly", timescale],
            "variable": ["Air Temperature at 2m", variable],
        }
        # Go through the users inputs
        # See if they match the required value for that argument
        # If not, print a warning to the user.
        for key, vals in zip(d.keys(), d.values()):
            if vals[0] != vals[1]:
                print(
                    "Weather station data can only be retrieved for {0}={1} \nYour input: {2} \nRetrieving data for {0}={1}".format(
                        key, vals[0], vals[1]
                    )
                )

        downscaling_method = "Dynamical"
        timescale = "hourly"
        variable = "Air Temperature at 2m"

        # Deal with scenario and time_slice arguments
        # Handle various use-cases of user inputs/errors
        if scenario is None:
            if time_slice is None:
                # Default
                scenario = ["Historical Climate"]
            else:
                scenario = []

        if resolution == "3 km":
            # Neither SSP 2-4.5 nor SSP 5-8.5 are valid options for scenario... need to remove
            for bad_scenario_choice in ["SSP 2-4.5", "SSP 5-8.5"]:
                if bad_scenario_choice in scenario:
                    error_message = f"{bad_scenario_choice} is not a valid scenario input for resolution = {resolution}"
                    print(_format_error_print_message(error_message))
                    return None
        if time_slice is not None:
            # Make sure time_slice and scenario match each other
            # If time_slice is not assigned by the user, it will be auto-set by the DataInterface object
            if any(value < 2015 for value in time_slice) and (
                ("Historical Climate") not in scenario
            ):
                # Add Historical Climate to scenario if the time scale includes historical period
                scenario.append("Historical Climate")
            if any(value >= 2015 for value in time_slice) and not any(
                "SSP" in item for item in scenario
            ):
                # If the time scale includes the future period and no SSP data is selected, add SSP 3-7.0
                scenario.append("SSP 3-7.0")

        if stations is None:
            # Print a warning if the user wants to retrieve station data but they don't input a value for station
            # The function will return all the stations by default
            print(
                "WARNING: You haven't set a particular station/s to retrieve data for; the function will default to retrieving all available stations in the domain"
            )
        if (stations is not None) and (type(stations) == str):
            # Catch easy user mistake without raising an error: Inputting a string instead of a list of list
            # I imagine this could happen if you just wanted to retrieve data for a single station
            stations = [stations]

    # If lat/lon input, change cached_area and area_subset
    if (latitude is not None) and (longitude is not None):
        area_subset = "lat/lon"
        cached_area = ["coordinate selection"]

    # Check warming level inputs
    try:
        warming_level = _error_handling_warming_level_inputs(
            warming_level, "warming_level", downscaling_method, resolution
        )
        warming_level_months = _error_handling_warming_level_inputs(
            warming_level_months, "warming_level_months", downscaling_method, resolution
        )
    except ValueError as error_message:
        print(_format_error_print_message(error_message))
        return None

    # Make sure the inputs are a valid type (no floats, ints, dictionaries, etc)
    for user_input in [
        variable,
        downscaling_method,
        resolution,
        timescale,
        area_subset,
        area_average,
        approach,
        scenario,
    ]:
        if (user_input is not None) and (type(user_input) not in [str, list]):
            error_message = (
                "Function arguments require a single string value for your inputs"
            )
            print(_format_error_print_message(error_message))
            return None

    # Maybe area average was capitalized wrong
    # Fix it instead of raising an error
    if area_average is not None:
        if area_average.lower().title() in ["Yes", "No"]:
            area_average = area_average.lower().title()

    # Cached area should be a list even if its just a single string value (i.e. [str])
    cached_area = [cached_area] if type(cached_area) != list else cached_area

    # If all_touched is None set to False
    if all_touched == None:
        all_touched = False

    # Check if all_touched boolean
    if all_touched not in [True, False]:
        raise ValueError("all_touched must be a boolean")

    # Make sure approach matches the scenario setting
    # See function documentation for more details
    approach, scenario, warming_level, time_slice = _error_handling_approach_inputs(
        approach, scenario, warming_level, time_slice
    )

    # Make sure the area subset is set to a valid input
    # See function documentation for more details
    try:
        area_subset = _error_handling_location_settings(area_subset, cached_area)
    except ValueError as error_message:
        print(_format_error_print_message(error_message))
        return None

    ## --------- ADD ARGUMENTS TO A DICTIONARY ----------
    # A dictionary is used for all the inputs in selections because it enables better error handling and cleaner code when we set selections.thing = thing
    # It also makes parsing through the arguments easier
    # The inputs here need to be a list so that they can be parsed easier by the _check_if_good_input function when comparing with the valid catalog options to confirm the user input is valid
    scenario_user_input = scenario  # What the user originally input for scenario

    check_input_df = get_data_options(
        variable=variable,
        downscaling_method=downscaling_method,
        resolution=resolution,
        timescale=timescale,
        scenario=scenario,
        tidy=False,
        enable_hidden_vars=enable_hidden_vars,
    )

    if check_input_df is None:
        # Does this print an informative error message? I think so but I'm not sure.
        return None

    # Merge with variable dataframe to get all the info about the data in one place
    check_input_df = check_input_df.merge(var_df, how="left")

    # Convert to a dictionary so it can be easily parsed by the function
    cat_dict = check_input_df.to_dict(orient="list")
    for key, values in cat_dict.items():
        # Remove non-unique values
        # This happens because we converted a pandas dataframe to a dictionary
        cat_dict[key] = list(np.unique(values))

    # _check_if_good_input will default fill the scenario options with EVERY possible option
    # It will in most cases give a list of all the available SSPs and the two historical data options (Historical Climate AND Historical Reconstruction)
    # I'd like the function to just default to Historical Climate + SSPs
    # So, if the user input None for scenario, I just remove Historical Reconstruction from the list
    if scenario_user_input == None:
        if "Historical Reconstruction" in cat_dict["scenario"]:
            cat_dict["scenario"] = [
                item
                for item in cat_dict["scenario"]
                if item != "Historical Reconstruction"
            ]

    # Check if it's an index
    # Use proper variable_id lookup that considers downscaling method and timescale
    variable_ids = _get_var_ids(
        data_interface.variable_descriptions,
        cat_dict["variable"][0],
        cat_dict["downscaling_method"][0],
        cat_dict["timescale"][0],
        enable_hidden_vars=enable_hidden_vars,
    )
    variable_id = variable_ids[0] if variable_ids else ""
    variable_type = "Derived Index" if "_index" in variable_id else "Variable"

    # Settings for selections
    selections_dict = {
        "variable": cat_dict["variable"][0],
        "timescale": cat_dict["timescale"][0],
        "downscaling_method": cat_dict["downscaling_method"][0],
        "resolution": cat_dict["resolution"][0],
        "data_type": data_type,
        "scenario": cat_dict["scenario"],
        "area_average": area_average,
        "area_subset": area_subset,
        "cached_area": cached_area,
        "approach": approach,
        "warming_level": warming_level,
        "warming_level_window": warming_level_window,
        "warming_level_months": warming_level_months,
        "variable_type": variable_type,
        "time_slice": time_slice,
        "latitude": latitude,
        "longitude": longitude,
        "stations": stations,
        "all_touched": all_touched,
    }

    scenario_ssp, scenario_historical = _get_scenario_ssp_scenario_historical(
        selections_dict["approach"], selections_dict["scenario"]
    )
    selections_dict["scenario_ssp"] = scenario_ssp
    selections_dict["scenario_historical"] = scenario_historical

    ## ----- SET THE UNITS ------

    # Query the table based on input values
    # Timescale in table needs to be handled differently
    # This is because the monthly variables are derived from daily variables, so they are listed in the table as "daily, monthly"
    # Hourly variables may be different
    # Querying the data needs special handling due to the layout of the csv file
    var_df_query = var_df[
        (var_df["display_name"] == selections_dict["variable"])
        & (var_df["downscaling_method"] == selections_dict["downscaling_method"])
    ]
    var_df_query = var_df_query[
        var_df_query["timescale"].str.contains(selections_dict["timescale"])
    ]

    selections_dict["units"] = (
        units if units is not None else var_df_query["unit"].item()
    )  # Set units if user doesn't set them manually

    ## ------ CREATE SELECTIONS OBJECT --------
    selections = DataParameters(enable_hidden_vars=enable_hidden_vars)

    # Error handling for stations
    # If the user input a value for the station argument, check that it exists
    # If it doesn't exist, see if you can find something close... if not, throw an error
    # Need to do the error handling here since it requires the selections object
    if data_type == "Stations" and stations is not None:
        stations = _check_valid_input_station(
            stations, selections._stations_gdf.station.values
        )

    ## ------- SET EACH ATTRIBUTE -------

    try:
        selections.data_type = selections_dict["data_type"]
        selections.approach = selections_dict["approach"]
        selections.scenario_ssp = selections_dict["scenario_ssp"]
        selections.scenario_historical = selections_dict["scenario_historical"]
        selections.area_subset = selections_dict["area_subset"]
        selections.cached_area = selections_dict["cached_area"]
        selections.downscaling_method = selections_dict["downscaling_method"]
        selections.resolution = selections_dict["resolution"]
        selections.timescale = selections_dict["timescale"]
        selections.variable_type = selections_dict["variable_type"]
        selections.variable = selections_dict["variable"]
        selections.units = selections_dict["units"]
        selections.all_touched = selections_dict["all_touched"]

        # Setting the values like this enables us to take advantage of the default settings in DataParameters without having to manually set defaults in this function
        if selections_dict["warming_level"] is not None:
            selections.warming_level = selections_dict["warming_level"]
        if selections_dict["warming_level_window"] is not None:
            selections.warming_level_window = selections_dict["warming_level_window"]
        if selections_dict["area_average"] is not None:
            selections.area_average = selections_dict["area_average"]
        if selections_dict["time_slice"] is not None:
            selections.time_slice = selections_dict["time_slice"]
        if selections_dict["warming_level_months"] is not None:
            selections.warming_level_months = selections_dict["warming_level_months"]
        if selections_dict["latitude"] is not None:
            selections.latitude = selections_dict["latitude"]
        if selections_dict["longitude"] is not None:
            selections.longitude = selections_dict["longitude"]
        if selections_dict["stations"] is not None:
            selections.stations = selections_dict["stations"]

        for key in kwargs:
            if getattr(selections, key, None) is not None:
                setattr(selections, key, kwargs[key])

        # Force update variable_id after all attributes are set
        # This ensures hidden variables work correctly
        selections.variable_id = _get_var_ids(
            data_interface.variable_descriptions,
            selections.variable,
            selections.downscaling_method,
            selections.timescale,
            enable_hidden_vars=enable_hidden_vars,
        )

    except ValueError as error_message:
        # The error message is really long
        # And sometimes has a confusing Attribute Error: Pieces mismatch that is hard to interpret
        # Here we just print the error message and return None instead of allowing the long error to be raised by default
        print(_format_error_print_message(error_message))
        return None

    # Retrieve data
    data = selections.retrieve()
    return data

Notes on behavior

DataParameters.retrieve() is the closest analogue to the old GUI workflow.
get_data_options() and get_subsetting_options() are useful when you need to discover valid values programmatically.
The module does not raise on every bad input. In several cases it prints a diagnostic message and returns None to match the original notebook behavior.
get_data() is the lower-level direct entry point and accepts the same legacy naming conventions as the GUI.

Legacy Data Interface

On this page

What this module does

Core concepts

Query flow

Legacy field names

Examples

Direct query with DataParameters

Direct query with get_data

Public API

VariableDescriptions

load()

DataInterface

variable_descriptions property

stations property

stations_gdf property

data_catalog property

warming_level_times property

boundary_catalog property

geographies property

DataParameters

retrieve(config=None, merge=True)

get_data_options

get_subsetting_options

get_data

Notes on behavior

Related legacy modules

Direct query with `DataParameters`

Direct query with `get_data`

`load()`

`variable_descriptions` `property`

`stations` `property`

`stations_gdf` `property`

`data_catalog` `property`

`warming_level_times` `property`

`boundary_catalog` `property`

`geographies` `property`

`retrieve(config=None, merge=True)`