Skip to content

Core Data Interface (Detailed)

The legacy data interface module providing function-based API for climate data access.

Overview

climakitae.core.data_interface is the main entry point for the legacy interface. It provides: - DataParameters class — Configuration object for data queries - get_data() function — Execute data queries with validation

Warning

This is the legacy interface. For new code, use climakitae.new_core.user_interface.ClimateData instead.

DataParameters Class

Bases: Parameterized

Python param object to hold data parameters for use in panel GUI.

Call DataParameters when you want to select and retrieve data from the climakitae data catalog without using the ckg.Select GUI. ckg.Select uses this class to store selections and retrieve data.

DataParameters calls DataInterface, a singleton class that makes the connection to the intake-esm data store in S3 bucket.

Attributes

unit_options_dict : dict options dictionary for converting unit to other units area_subset : str dataset to use from Boundaries for sub area selection cached_area : list of strs one or more features from area_subset datasets to use for selection latitude : tuple latitude range of selection box longitude : tuple longitude range of selection box variable_type : str toggle raw or derived variable selection default_variable : str initial variable to have selected in widget time_slice : tuple year range to select resolution : str resolution of data to select ("3 km", "9 km", "45 km") timescale : str frequency of dataset ("hourly", "daily", "monthly") scenario_historical : list of strs historical scenario selections area_average : str whether to comput area average ("Yes", "No") downscaling_method : str whether to choose WRF or LOCA2 data or both ("Dynamical", "Statistical", "Dynamical+Statistical") data_type : str whether to choose gridded or station based data ("Gridded", "Stations") stations : list or strs list of stations that can be filtered by cached_area _station_data_info : str informational statement when station data selected with data_type scenario_ssp : list of strs list of future climate scenarios selected (availability depends on other params) simulation : list of strs list of simulations (models) selected (availability depends on other params) variable : str variable long display name units : str unit abbreviation currently of the data (native or converted) enable_hidden_vars : boolean enable selection of variables that are hidden from the GUI? extended_description : str extended description of the data variable variable_id : list of strs list of variable ids that match the variable (WRF and LOCA2 can have different codes for same type of variable) historical_climate_range_wrf : tuple time range of historical WRF data historical_climate_range_loca : tuple time range of historical LOCA2 data historical_climate_range_wrf_and_loca : tuple time range of historical WRF and LOCA2 data combined historical_reconstruction_range : tuple time range of historical reanalysis data ssp_range : tuple time range of future scenario SSP data _info_about_station_data : str warning message about station data _data_warning : str warning about selecting unavailable data combination data_interface : DataInterface data connection singleton class that provides data _data_catalog : intake_esm.source.ESMDataSource shorthand alias to DataInterface.data_catalog _variable_descriptions : pd.DataFrame shorthand alias to DataInterface.variable_descriptions _stations_gdf : gpd.GeoDataFrame shorthand alias to DataInterface.stations_gdf _geographies : Boundaries shorthand alias to DataInterface.geographies _geography_choose : dict shorthand alias to Boundaries.boundary_dict() _warming_level_times : pd.DataFrame shorthand alias to DataInterface.warming_level_times colormap : str default colormap to render the currently selected data scenario_options : list of strs list of available scenarios (historical and ssp) for selection variable_options_df : pd.DataFrame filtered variable descriptions for the downscaling_method and timescale warming_level : array global warming level(s) warming_level_window : integer years around Global Warming Level (+/-) (e.g. 15 means a 30yr window) approach : str, "Warming Level" or "Time" how do you want the data to be retrieved? warming_level_months : array months of year to use for computing warming levels default to entire calendar year: 1,2,3,4,5,6,7,8,9,10,11,12 all_touched : boolean spatial subset option for within or touching selection

Source code in climakitae/core/data_interface.py
def __init__(self, **params):
    # Set default values
    super().__init__(**params)

    self.data_interface = DataInterface()

    # Data Catalog
    self._data_catalog = self.data_interface.data_catalog

    # Warming Levels Table
    self._warming_level_times = self.data_interface.warming_level_times

    # variable descriptions
    self._variable_descriptions = self.data_interface.variable_descriptions

    # station data
    self._stations_gdf = self.data_interface.stations_gdf

    # Get geography boundaries and selection options
    self._geographies = self.data_interface.geographies
    self._geography_choose = self._geographies.boundary_dict()

    # Set location params
    self.area_subset = "none"
    self.param["area_subset"].objects = list(self._geography_choose.keys())
    self.param["cached_area"].objects = list(
        self._geography_choose[self.area_subset].keys()
    )

    self.all_touched = False

    # Set data params
    (
        self.scenario_options,
        self.simulation,
        unique_variable_ids,
    ) = _get_user_options(
        data_catalog=self._data_catalog,
        downscaling_method=self.downscaling_method,
        timescale=self.timescale,
        resolution=self.resolution,
    )
    self.variable_options_df = _get_variable_options_df(
        variable_descriptions=self._variable_descriptions,
        unique_variable_ids=unique_variable_ids,
        downscaling_method=self.downscaling_method,
        timescale=self.timescale,
        enable_hidden_vars=self.enable_hidden_vars,
    )

    # Show derived index option?
    indices = True
    if self.data_type == "Stations":
        indices = False
    if self.downscaling_method != "Dynamical":
        indices = False
    if self.timescale == "monthly":
        indices = False
    if not indices:
        self.param["variable_type"].objects = ["Variable"]
        self.variable_type = "Variable"
    else:
        self.param["variable_type"].objects = ["Variable", "Derived Index"]

    # Set scenario param
    scenario_ssp_options = [
        scenario_to_experiment_id(scen, reverse=True)
        for scen in self.scenario_options
        if "ssp" in scen
    ]
    for scenario_i in SSPS:
        if scenario_i in scenario_ssp_options:  # Reorder list
            scenario_ssp_options.remove(scenario_i)  # Remove item
            scenario_ssp_options.append(scenario_i)  # Add to back of list
    self.param["scenario_ssp"].objects = scenario_ssp_options
    self.scenario_ssp = []

    # Set variable param
    self.param["variable"].objects = (
        self.variable_options_df.display_name.values.tolist()
    )
    self.variable = self.default_variable

    # Set colormap, units, & extended description
    var_info = self.variable_options_df[
        self.variable_options_df["display_name"] == self.variable
    ]

    # Set params that are not selected by the user
    self.colormap = var_info.colormap.item()
    self.units = var_info.unit.item()
    self.extended_description = var_info.extended_description.item()
    self.variable_id = _get_var_ids(
        self._variable_descriptions,
        self.variable,
        self.downscaling_method,
        self.timescale,
        self.enable_hidden_vars,
    )
    self._data_warning = ""

retrieve(config=None, merge=True)

Retrieve data from catalog

By default, DataParameters determines the data retrieved. Grabs the data from the AWS S3 bucket, returns lazily loaded dask array. User-facing function that provides a wrapper for read_catalog_from_select.

Returns:

Name Type Description
data_return DataArray | Dataset | List[DataArray]

DataArray or Dataset object

Source code in climakitae/core/data_interface.py
def retrieve(
    self, config: str = None, merge: bool = True
) -> Union[xr.DataArray, xr.Dataset, List[xr.DataArray]]:
    """Retrieve data from catalog

    By default, DataParameters determines the data retrieved.
    Grabs the data from the AWS S3 bucket, returns lazily loaded dask array.
    User-facing function that provides a wrapper for read_catalog_from_select.

    Returns
    -------
    data_return : xr.DataArray | xr.Dataset | List[xr.DataArray]
        DataArray or Dataset object

    """

    def _warn_of_large_file_size(da: xr.DataArray):
        """Warn user if the data array is large"""
        nbytes = da.nbytes
        match nbytes:
            case nbytes if nbytes >= int(1e9) and nbytes < int(5e9):
                print(
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                    "! Returned data array is large. Operations could take up to 5x longer than 1GB of data!\n"
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                )
            case nbytes if nbytes >= int(5e9) and nbytes < int(1e10):
                print(
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                    "!! Returned data array is very large. Operations could take up to 8x longer than 1GB of data !!\n"
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                )
            case nbytes if nbytes >= int(1e10):
                print(
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                    "!!! Returned data array is huge. Operations could take 10x to infinity longer than 1GB of data !!!\n"
                    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
                )

    def _warn_of_empty_data(self):
        if self.approach == "Warming Level" and (len(self.warming_level) > 1):
            print(
                "WARNING FOR WARMING LEVELS APPROACH\n-----------------------------------\nThere may be NaNs in your data for certain simulation/warming level combinations if the warming level is not reached for that particular simulation before the year 2100. \n\nThis does not mean you have missing data, but rather a feature of how the data is combined in retrieval to return a single data object. \n\nIf you want to remove these empty simulations, it is recommended to first subset the data object by each individual warming level and then dropping NaN values."
            )
        elif (self.approach == "Time") and (len(self.scenario_ssp) > 1):
            print(
                "WARNING\n-------\nYou have retrieved data for more than one SSP, but not all ensemble members for each GCM are available for all SSPs.\n\nAs a result, some scenario and simulation combinations may contain NaN values.\n\nIf you want to remove these empty simulations, it is recommended to first subset the data object by each individual scenario and then dropping NaN values."
            )

    data_return = read_catalog_from_select(self)

    if isinstance(data_return, list):
        for l in data_return:
            _warn_of_large_file_size(l)
    else:
        _warn_of_large_file_size(data_return)

    # Warn about empty simulations for certain selections
    _warn_of_empty_data(self)

    return data_return

Get Data Function

Retrieve formatted data from the Analytics Engine data catalog.

Contrasts with DataParameters().retrieve(), which retrieves data from the user inputs in climakitaegui's selections GUI.

Parameters

variable : str String name of climate variable resolution : str, one of ["3 km", "9 km", "45 km"] Resolution of data in kilometers timescale : str, one of ["hourly", "daily", "monthly"] Temporal frequency of dataset downscaling_method : str, one of ["Dynamical", "Statistical", "Dynamical+Statistical"], optional Downscaling method of the data: WRF ("Dynamical"), LOCA2 ("Statistical"), or both "Dynamical+Statistical" Default to "Dynamical" data_type : str, one of ["Gridded", "Stations"], optional Whether to choose gridded data or weather station data Default to "Gridded" approach : one of ["Time", "Warming Level"], optional Default to "Time" scenario : str or list of str, optional SSP scenario ["SSP 3-7.0", "SSP 2-4.5","SSP 5-8.5"] and/or historical data selection ["Historical Climate", "Historical Reconstruction"] If approach = "Time", you need to set a valid option If approach = "Warming Level", scenario is ignored units : str, optional Variable units. Defaults to native units of data area_subset : str, optional Area category: i.e "CA counties" Defaults to entire domain ("none") cached_area : list, optional Area: i.e "Alameda county" Defaults to entire domain (["entire domain"]) area_average : one of ["Yes","No"], optional Take an average over spatial domain? Default to "No". latitude : None or tuple of float, optional Tuple of valid latitude bounds Default to entire domain longitude : None or tuple of float, optional Tuple of valid longitude bounds Default to entire domain time_slice : tuple, optional Time range for retrieved data Only valid for approach = "Time" stations : list of str, optional Which weather stations to retrieve data for Only valid for data_type = "Stations" Default to all stations warming_level : list of float, optional Must be one of the warming levels available in clmakitae.core.constants Only valid for approach = "Warming Level" and data_type = "Stations" warming_level_window : int in range (5,25), optional Years around Global Warming Level (+/-) (e.g. 15 means a 30yr window) warming_level_months : list of int, optional Months of year for which to perform warming level computation Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12] For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season. Only valid for approach = "Warming Level" and data_type = "Stations" all_touched : boolean spatial subset option for within or touching selection enable_hidden_vars : boolean, optional Return all variables, including the ones in which "show" is set to False? Default to False kwargs : dict Additional keyword arguments to pass to DataParameters()

Returns

xr.DataArray The requested climate data, or None if an error occurred.

Notes

Errors aren't raised by the function. Rather, an appropriate informative message is printed, and the function returns None. This is due to the fact that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad inputs; instead, that error is ignored and a more informative error message is printed instead.

Source code in climakitae/core/data_interface.py
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
def get_data(
    variable: str,
    resolution: str,
    timescale: str,
    downscaling_method: str = "Dynamical",
    data_type: str = "Gridded",
    approach: str = "Time",
    scenario: Union[str, list[str]] = None,
    units: str = None,
    warming_level: list[float] = None,
    area_subset: str = "none",
    latitude: tuple[float, float] = None,
    longitude: tuple[float, float] = None,
    cached_area: list[str] = None,
    area_average: str = None,
    time_slice: tuple = None,
    stations: list[str] = None,
    warming_level_window: int = None,
    warming_level_months: list[int] = None,
    all_touched=False,
    enable_hidden_vars: bool = False,
    **kwargs,
) -> xr.DataArray:
    """Retrieve formatted data from the Analytics Engine data catalog.

    Contrasts with DataParameters().retrieve(), which retrieves data from
    the user inputs in climakitaegui's selections GUI.

    Parameters
    ----------
    variable : str
        String name of climate variable
    resolution : str, one of ["3 km", "9 km", "45 km"]
        Resolution of data in kilometers
    timescale : str, one of ["hourly", "daily", "monthly"]
        Temporal frequency of dataset
    downscaling_method : str, one of ["Dynamical", "Statistical", "Dynamical+Statistical"], optional
        Downscaling method of the data:
        WRF ("Dynamical"), LOCA2 ("Statistical"), or both "Dynamical+Statistical"
        Default to "Dynamical"
    data_type : str, one of ["Gridded", "Stations"], optional
        Whether to choose gridded data or weather station data
        Default to "Gridded"
    approach : one of ["Time", "Warming Level"], optional
        Default to "Time"
    scenario : str or list of str, optional
        SSP scenario ["SSP 3-7.0", "SSP 2-4.5","SSP 5-8.5"] and/or historical data selection ["Historical Climate", "Historical Reconstruction"]
        If approach = "Time", you need to set a valid option
        If approach = "Warming Level", scenario is ignored
    units : str, optional
        Variable units.
        Defaults to native units of data
    area_subset : str, optional
        Area category: i.e "CA counties"
        Defaults to entire domain ("none")
    cached_area : list, optional
        Area: i.e "Alameda county"
        Defaults to entire domain (["entire domain"])
    area_average : one of ["Yes","No"], optional
        Take an average over spatial domain?
        Default to "No".
    latitude : None or tuple of float, optional
        Tuple of valid latitude bounds
        Default to entire domain
    longitude : None or tuple of float, optional
        Tuple of valid longitude bounds
        Default to entire domain
    time_slice : tuple, optional
        Time range for retrieved data
        Only valid for approach = "Time"
    stations : list of str, optional
        Which weather stations to retrieve data for
        Only valid for data_type = "Stations"
        Default to all stations
    warming_level : list of float, optional
        Must be one of the warming levels available in `clmakitae.core.constants`
        Only valid for approach = "Warming Level" and data_type = "Stations"
    warming_level_window : int in range (5,25), optional
        Years around Global Warming Level (+/-) \n (e.g. 15 means a 30yr window)
    warming_level_months : list of int, optional
        Months of year for which to perform warming level computation
        Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12]
        For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season.
        Only valid for approach = "Warming Level" and data_type = "Stations"
    all_touched : boolean
        spatial subset option for within or touching selection
    enable_hidden_vars : boolean, optional
        Return all variables, including the ones in which "show" is set to False?
        Default to False
    kwargs : dict
        Additional keyword arguments to pass to DataParameters()

    Returns
    -------
    xr.DataArray
        The requested climate data, or None if an error occurred.

    Notes
    -----
    Errors aren't raised by the function. Rather, an appropriate informative
    message is printed, and the function returns None. This is due to the fact
    that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad
    inputs; instead, that error is ignored and a more informative error message
    is printed instead.

    """

    def _check_valid_input_station(
        stations: list[str], station_options_all: list[str]
    ) -> list[str]:
        """Check that the user input a valid value for station
        If invalid input, the function will "guess" a close-ish station using difflib
        See _get_closest_option function for more info
        If invalid input and no guesses found, the function will print an informative
        error message and raise a ValueError

        Parameters
        ----------
        stations : list[str]
        station_options_all : list of string
            All the possible station options
            Can be retrieved from DataParameters()._stations_gdf.station.values

        Returns
        -------
        stations : list[str]

        """
        station_options_all = sorted(
            station_options_all
        )  # sorted() puts the list in alphabetical order

        # Keep track of if error was raised and message was printed to user
        # If more than one station prints errors to the console, print a space between each station
        printed_warning = False

        for i, station_i in enumerate(stations):  # Go through all the stations
            # If the station is a valid option, don't do anything
            if station_i in station_options_all:
                continue

            if printed_warning:
                print(
                    "\n", end=""
                )  # Add a space between stations for better readability

            # If the station isn't a valid option...
            print("Input station='" + station_i + "' is not a valid option.")
            closest_options = _get_closest_options(
                station_i, station_options_all
            )  # See if theres any similar options

            # Sad! No closest options found. Just set the key to all valid options
            match closest_options:
                case None:
                    print("Valid options: \n- ", end="")
                    print("\n- ".join(station_options_all))
                    raise ValueError("Bad input")

                # Just one option in the list
                case closest_options if len(closest_options) == 1:
                    print("Closest option: '" + closest_options[0] + "'")

                case closest_options if len(closest_options) > 1:
                    print("Closest options: \n- " + "\n- ".join(closest_options))

            print("Outputting data for station='" + closest_options[0] + "'")
            stations[i] = closest_options[
                0
            ]  # Replace that value in the list with the best option :)

            printed_warning = True

        return stations

    # Internal functions
    def _error_handling_warming_level_inputs(
        wl: Union[list[float], list[int]],
        argument_name: str,
        downscaling_method: str,
        resolution: str,
    ):
        """Error handling for arguments: warming_level and warming_level_month
        Both require a list of either floats or ints
        argument_name is either "warming_level" or "warming_level_months" and is used to
        print an appropriate error message for bad input

        """
        # Find the WL bounds for LOCA and WRF
        loca, wrf = create_ae_warming_trajectories(resolution)
        loca_max = round(loca.max().max(), 2)
        wrf_max = round(wrf.max().max(), 2)

        match downscaling_method:
            case "Statistical":
                max_val = loca_max
            case "Dynamical":
                max_val = wrf_max
            case "Dynamical+Statistical":
                max_val = min(loca_max, wrf_max)
            case _:
                raise ValueError(
                    "Downscaling method be 'Statistical', 'Dynamical', or 'Dynamical+Statistical'"
                )

        if (wl is not None) and not isinstance(wl, list):
            if isinstance(wl, (float, int)):  # Convert float to a singleton list
                wl = [wl]
            if not isinstance(wl, list):
                raise ValueError(
                    f"""Function argument {argument_name} requires a float/int or list 
                    of floats/ints input. Your input: {type(wl)}"""
                )
        if isinstance(wl, list):
            for x in wl:
                if not isinstance(x, (float, int)):
                    raise ValueError(
                        f"Each item in '{argument_name}' must be a float or int. Got: {type(x)}"
                    )
                if argument_name == "warming_level":
                    if x < 0 or x > max_val:
                        raise ValueError(
                            f"{argument_name} value {x}. "
                            f"Allowed range for {downscaling_method}-downscaled data at {resolution} resolution is 0 to {max_val:.2f}."
                        )
        return wl

    def _error_handling_approach_inputs(
        approach: str, scenario: str, warming_level: list[float], time_slice: tuple
    ) -> tuple[str, str, list[float], tuple]:
        """Error handling for approach and scenario inputs"""
        _valid_options_approach = ["Time", "Warming Level"]
        if approach not in _valid_options_approach:
            # Maybe the user just capitalized it wrong
            # If so, fix it for them-- don't raise an error
            if approach.lower().title() in _valid_options_approach:
                approach = approach.lower().title()
            else:
                # An error will be raised later when you try to set selections
                pass

        # Print a warming if scenario is set but approach is Warming Level
        if approach == "Warming Level" and scenario not in [None, ["n/a"], "n/a"]:
            print(
                'WARNING: "scenario" argument will be ignored for warming levels approach'
            )
            scenario = None
        if approach == "Warming Level" and time_slice != None:
            print(
                'WARNING: "time_slice" argument will be ignored for warming levels approach'
            )
            time_slice = None

        if approach == "Time":
            warming_level = ["n/a"]

        return approach, scenario, warming_level, time_slice

    def _error_handling_location_settings(
        area_subset: list[str], cached_area: list[str]
    ) -> list[str]:
        """Maybe the user put an input for cached area but not for area subset
        We need to have the matching/correct area subset in order for selections.retrieve() to actually subset the data
        Here, we load in the geometry options to set area_subset to the correct value
        This also raises an appropriate error if the user has a bad input

        """
        if area_subset == "none" and cached_area != ["entire domain"]:
            geom_df = get_subsetting_options(area_subset="all").reset_index()
            area_subset_vals = geom_df[geom_df["cached_area"] == cached_area[0]][
                "area_subset"
            ].values
            if len(area_subset_vals) == 0:
                raise ValueError("Invalid input for argument 'cached_area'")
            else:
                area_subset = area_subset_vals[0]
        return area_subset

    def _get_scenario_ssp_scenario_historical(
        approach: str, scenario: str
    ) -> tuple[str, str]:
        """Get scenario_ssp, scenario_historical depending on user inputs"""
        match approach:
            case "Warming Level":
                scenario_ssp = ["n/a"]
                scenario_historical = ["n/a"]
            case "Time":
                if (
                    "Historical Reconstruction" in scenario
                ):  # Handling for Historical Reconstruction option
                    scenario_historical = [x for x in scenario if "Historical" in x]
                    scenario_ssp = []
                    if (
                        len(scenario) != 1
                    ):  # No SSP options for Historical Reconstruction data
                        print(
                            "WARNING: Historical Reconstruction data cannot be retrieved in the same data object as SSP scenario options. SSP data will not be retrieved."
                        )
                else:
                    scenario_ssp = [
                        x for x in scenario if "Historical" not in x
                    ]  # Add non-historical SSPs to scenario_ssp key
                    if "Historical Climate" in scenario:
                        scenario_historical = ["Historical Climate"]
                    else:
                        scenario_historical = []
            case _:
                scenario_ssp, scenario_historical = None, None
        return scenario_ssp, scenario_historical

    # default values set as lists are dangerous, so set them to None and then set to
    # default value later
    if cached_area is None:
        cached_area = ["entire domain"]
    # Get intake catalog and variable descriptions from DataInterface object
    data_interface = DataInterface()
    var_df = data_interface.variable_descriptions.rename(
        columns={"variable": "display_name"}
    )  # Rename column so that it can be merged with cat_df

    # Filter variable descriptions based on enable_hidden_vars
    if not enable_hidden_vars:
        var_df = var_df[var_df["show"] == True]

    ## --------- ERROR HANDLING ----------
    # Deal with bad or missing users inputs

    # Station data error handling
    if data_type == "Stations":
        # dictionary with { argument name : [valid option, user input]}
        d = {
            "downscaling_method": ["Dynamical", downscaling_method],
            "timescale": ["hourly", timescale],
            "variable": ["Air Temperature at 2m", variable],
        }
        # Go through the users inputs
        # See if they match the required value for that argument
        # If not, print a warning to the user.
        for key, vals in zip(d.keys(), d.values()):
            if vals[0] != vals[1]:
                print(
                    "Weather station data can only be retrieved for {0}={1} \nYour input: {2} \nRetrieving data for {0}={1}".format(
                        key, vals[0], vals[1]
                    )
                )

        downscaling_method = "Dynamical"
        timescale = "hourly"
        variable = "Air Temperature at 2m"

        # Deal with scenario and time_slice arguments
        # Handle various use-cases of user inputs/errors
        if scenario is None:
            if time_slice is None:
                # Default
                scenario = ["Historical Climate"]
            else:
                scenario = []

        if resolution == "3 km":
            # Neither SSP 2-4.5 nor SSP 5-8.5 are valid options for scenario... need to remove
            for bad_scenario_choice in ["SSP 2-4.5", "SSP 5-8.5"]:
                if bad_scenario_choice in scenario:
                    error_message = f"{bad_scenario_choice} is not a valid scenario input for resolution = {resolution}"
                    print(_format_error_print_message(error_message))
                    return None
        if time_slice is not None:
            # Make sure time_slice and scenario match each other
            # If time_slice is not assigned by the user, it will be auto-set by the DataInterface object
            if any(value < 2015 for value in time_slice) and (
                ("Historical Climate") not in scenario
            ):
                # Add Historical Climate to scenario if the time scale includes historical period
                scenario.append("Historical Climate")
            if any(value >= 2015 for value in time_slice) and not any(
                "SSP" in item for item in scenario
            ):
                # If the time scale includes the future period and no SSP data is selected, add SSP 3-7.0
                scenario.append("SSP 3-7.0")

        if stations is None:
            # Print a warning if the user wants to retrieve station data but they don't input a value for station
            # The function will return all the stations by default
            print(
                "WARNING: You haven't set a particular station/s to retrieve data for; the function will default to retrieving all available stations in the domain"
            )
        if (stations is not None) and (type(stations) == str):
            # Catch easy user mistake without raising an error: Inputting a string instead of a list of list
            # I imagine this could happen if you just wanted to retrieve data for a single station
            stations = [stations]

    # If lat/lon input, change cached_area and area_subset
    if (latitude is not None) and (longitude is not None):
        area_subset = "lat/lon"
        cached_area = ["coordinate selection"]

    # Check warming level inputs
    try:
        warming_level = _error_handling_warming_level_inputs(
            warming_level, "warming_level", downscaling_method, resolution
        )
        warming_level_months = _error_handling_warming_level_inputs(
            warming_level_months, "warming_level_months", downscaling_method, resolution
        )
    except ValueError as error_message:
        print(_format_error_print_message(error_message))
        return None

    # Make sure the inputs are a valid type (no floats, ints, dictionaries, etc)
    for user_input in [
        variable,
        downscaling_method,
        resolution,
        timescale,
        area_subset,
        area_average,
        approach,
        scenario,
    ]:
        if (user_input is not None) and (type(user_input) not in [str, list]):
            error_message = (
                "Function arguments require a single string value for your inputs"
            )
            print(_format_error_print_message(error_message))
            return None

    # Maybe area average was capitalized wrong
    # Fix it instead of raising an error
    if area_average is not None:
        if area_average.lower().title() in ["Yes", "No"]:
            area_average = area_average.lower().title()

    # Cached area should be a list even if its just a single string value (i.e. [str])
    cached_area = [cached_area] if type(cached_area) != list else cached_area

    # If all_touched is None set to False
    if all_touched == None:
        all_touched = False

    # Check if all_touched boolean
    if all_touched not in [True, False]:
        raise ValueError("all_touched must be a boolean")

    # Make sure approach matches the scenario setting
    # See function documentation for more details
    approach, scenario, warming_level, time_slice = _error_handling_approach_inputs(
        approach, scenario, warming_level, time_slice
    )

    # Make sure the area subset is set to a valid input
    # See function documentation for more details
    try:
        area_subset = _error_handling_location_settings(area_subset, cached_area)
    except ValueError as error_message:
        print(_format_error_print_message(error_message))
        return None

    ## --------- ADD ARGUMENTS TO A DICTIONARY ----------
    # A dictionary is used for all the inputs in selections because it enables better error handling and cleaner code when we set selections.thing = thing
    # It also makes parsing through the arguments easier
    # The inputs here need to be a list so that they can be parsed easier by the _check_if_good_input function when comparing with the valid catalog options to confirm the user input is valid
    scenario_user_input = scenario  # What the user originally input for scenario

    check_input_df = get_data_options(
        variable=variable,
        downscaling_method=downscaling_method,
        resolution=resolution,
        timescale=timescale,
        scenario=scenario,
        tidy=False,
        enable_hidden_vars=enable_hidden_vars,
    )

    if check_input_df is None:
        # Does this print an informative error message? I think so but I'm not sure.
        return None

    # Merge with variable dataframe to get all the info about the data in one place
    check_input_df = check_input_df.merge(var_df, how="left")

    # Convert to a dictionary so it can be easily parsed by the function
    cat_dict = check_input_df.to_dict(orient="list")
    for key, values in cat_dict.items():
        # Remove non-unique values
        # This happens because we converted a pandas dataframe to a dictionary
        cat_dict[key] = list(np.unique(values))

    # _check_if_good_input will default fill the scenario options with EVERY possible option
    # It will in most cases give a list of all the available SSPs and the two historical data options (Historical Climate AND Historical Reconstruction)
    # I'd like the function to just default to Historical Climate + SSPs
    # So, if the user input None for scenario, I just remove Historical Reconstruction from the list
    if scenario_user_input == None:
        if "Historical Reconstruction" in cat_dict["scenario"]:
            cat_dict["scenario"] = [
                item
                for item in cat_dict["scenario"]
                if item != "Historical Reconstruction"
            ]

    # Check if it's an index
    # Use proper variable_id lookup that considers downscaling method and timescale
    variable_ids = _get_var_ids(
        data_interface.variable_descriptions,
        cat_dict["variable"][0],
        cat_dict["downscaling_method"][0],
        cat_dict["timescale"][0],
        enable_hidden_vars=enable_hidden_vars,
    )
    variable_id = variable_ids[0] if variable_ids else ""
    variable_type = "Derived Index" if "_index" in variable_id else "Variable"

    # Settings for selections
    selections_dict = {
        "variable": cat_dict["variable"][0],
        "timescale": cat_dict["timescale"][0],
        "downscaling_method": cat_dict["downscaling_method"][0],
        "resolution": cat_dict["resolution"][0],
        "data_type": data_type,
        "scenario": cat_dict["scenario"],
        "area_average": area_average,
        "area_subset": area_subset,
        "cached_area": cached_area,
        "approach": approach,
        "warming_level": warming_level,
        "warming_level_window": warming_level_window,
        "warming_level_months": warming_level_months,
        "variable_type": variable_type,
        "time_slice": time_slice,
        "latitude": latitude,
        "longitude": longitude,
        "stations": stations,
        "all_touched": all_touched,
    }

    scenario_ssp, scenario_historical = _get_scenario_ssp_scenario_historical(
        selections_dict["approach"], selections_dict["scenario"]
    )
    selections_dict["scenario_ssp"] = scenario_ssp
    selections_dict["scenario_historical"] = scenario_historical

    ## ----- SET THE UNITS ------

    # Query the table based on input values
    # Timescale in table needs to be handled differently
    # This is because the monthly variables are derived from daily variables, so they are listed in the table as "daily, monthly"
    # Hourly variables may be different
    # Querying the data needs special handling due to the layout of the csv file
    var_df_query = var_df[
        (var_df["display_name"] == selections_dict["variable"])
        & (var_df["downscaling_method"] == selections_dict["downscaling_method"])
    ]
    var_df_query = var_df_query[
        var_df_query["timescale"].str.contains(selections_dict["timescale"])
    ]

    selections_dict["units"] = (
        units if units is not None else var_df_query["unit"].item()
    )  # Set units if user doesn't set them manually

    ## ------ CREATE SELECTIONS OBJECT --------
    selections = DataParameters(enable_hidden_vars=enable_hidden_vars)

    # Error handling for stations
    # If the user input a value for the station argument, check that it exists
    # If it doesn't exist, see if you can find something close... if not, throw an error
    # Need to do the error handling here since it requires the selections object
    if data_type == "Stations" and stations is not None:
        stations = _check_valid_input_station(
            stations, selections._stations_gdf.station.values
        )

    ## ------- SET EACH ATTRIBUTE -------

    try:
        selections.data_type = selections_dict["data_type"]
        selections.approach = selections_dict["approach"]
        selections.scenario_ssp = selections_dict["scenario_ssp"]
        selections.scenario_historical = selections_dict["scenario_historical"]
        selections.area_subset = selections_dict["area_subset"]
        selections.cached_area = selections_dict["cached_area"]
        selections.downscaling_method = selections_dict["downscaling_method"]
        selections.resolution = selections_dict["resolution"]
        selections.timescale = selections_dict["timescale"]
        selections.variable_type = selections_dict["variable_type"]
        selections.variable = selections_dict["variable"]
        selections.units = selections_dict["units"]
        selections.all_touched = selections_dict["all_touched"]

        # Setting the values like this enables us to take advantage of the default settings in DataParameters without having to manually set defaults in this function
        if selections_dict["warming_level"] is not None:
            selections.warming_level = selections_dict["warming_level"]
        if selections_dict["warming_level_window"] is not None:
            selections.warming_level_window = selections_dict["warming_level_window"]
        if selections_dict["area_average"] is not None:
            selections.area_average = selections_dict["area_average"]
        if selections_dict["time_slice"] is not None:
            selections.time_slice = selections_dict["time_slice"]
        if selections_dict["warming_level_months"] is not None:
            selections.warming_level_months = selections_dict["warming_level_months"]
        if selections_dict["latitude"] is not None:
            selections.latitude = selections_dict["latitude"]
        if selections_dict["longitude"] is not None:
            selections.longitude = selections_dict["longitude"]
        if selections_dict["stations"] is not None:
            selections.stations = selections_dict["stations"]

        for key in kwargs:
            if getattr(selections, key, None) is not None:
                setattr(selections, key, kwargs[key])

        # Force update variable_id after all attributes are set
        # This ensures hidden variables work correctly
        selections.variable_id = _get_var_ids(
            data_interface.variable_descriptions,
            selections.variable,
            selections.downscaling_method,
            selections.timescale,
            enable_hidden_vars=enable_hidden_vars,
        )

    except ValueError as error_message:
        # The error message is really long
        # And sometimes has a confusing Attribute Error: Pieces mismatch that is hard to interpret
        # Here we just print the error message and return None instead of allowing the long error to be raised by default
        print(_format_error_print_message(error_message))
        return None

    # Retrieve data
    data = selections.retrieve()
    return data

Migration Note

For new code, use the modern climakitae.new_core interface. See the migration guide for detailed upgrade instructions.

Quick Example

Legacy (old):

from climakitae.core.data_interface import get_data, DataParameters

params = DataParameters()
params.variable = "Maximum air temperature at 2m"
params.time_slice = (2015, 2045)            # year-range tuple
params.downscaling_method = "Statistical"    # \u2248 LOCA2
params.resolution = "3 km"                   # \u2248 grid_label d03
params.timescale = "monthly"                 # \u2248 table_id "mon"
data = get_data(params)

Modern (new):

from climakitae.new_core.user_interface import ClimateData

data = (ClimateData()
    .variable("tasmax")
    .processes({"time_slice": (2015, 2045)})
    .get())