Dataset Factory
Query builder and dataset construction.
Factory for creating Dataset objects with appropriate catalogs, validators, and processors.
This factory translates UI queries from the ClimateData interface into fully configured Dataset objects with the correct combination of data catalogs for accessing climate data, parameter validators for query validation, and processing steps for data transformation.
The factory uses registries to maintain extensible collections of components and automatically determines the appropriate combination based on query parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
_catalog |
DataCatalog or None
|
Reference to the DataCatalog singleton instance. |
_catalog_df |
DataFrame
|
DataFrame containing catalog metadata loaded from CSV. |
_validator_registry |
dict
|
Registry mapping validator keys to ParameterValidator classes. |
_processing_step_registry |
dict
|
Registry mapping processing step names to DataProcessor classes. |
Methods:
| Name | Description |
|---|---|
register_catalog |
Register a data catalog with the factory. |
register_validator |
Register a parameter validator with the factory. |
register_processing_step |
Register a processing step with the factory. |
create_validator |
Create a parameter validator based on registry key. |
create_dataset |
Create a Dataset based on a UI query from ClimateData. |
get_catalog_options |
Get available options for a specific catalog. |
get_validators |
Get a list of available validators. |
get_processors |
Get a list of available processors. |
Examples:
Creating a basic dataset:
>>> factory = DatasetFactory()
>>> query = {'data_type': 'gridded', 'variable': 'precipitation'}
>>> dataset = factory.create_dataset(query)
Registering custom components:
>>> factory = DatasetFactory()
>>> factory.register_validator('custom_type', CustomValidator)
>>> factory.register_processing_step('custom_process', CustomProcessor)
Notes
The factory automatically handles the selection of appropriate processing steps based on the query parameters. Some processing steps are mandatory and will be added automatically even if not explicitly requested.
See Also
Dataset : The main dataset container class DataCatalog : Data access abstraction ParameterValidator : Base class for parameter validation DataProcessor : Base class for data processing steps
Initialize the DatasetFactory.
Sets up the factory with access to the data catalog singleton, validator registry, and processor registry for creating fully configured Dataset objects.
Source code in climakitae/new_core/dataset_factory.py
create_dataset(ui_query)
Create a Dataset based on a UI query from ClimateData.
This method orchestrates the creation of a complete Dataset by: 1. Determining the appropriate catalog based on query parameters 2. Creating and configuring the parameter validator 3. Adding the necessary processing steps in the correct order
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ui_query
|
dict
|
Query dictionary from ClimateData UI containing at minimum: - 'data_type' : str, type of climate data - Additional keys depend on the specific data type and analysis |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
Properly configured Dataset instance ready for data retrieval and processing. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required query parameters are missing, invalid, or if no appropriate catalog can be determined. |
RuntimeError
|
If dataset creation fails due to internal errors. |
Notes
The method automatically adds mandatory processing steps such as concatenation and attribute updates even if not specified in the query.
Processing steps are applied in priority order, with preprocessing steps (like bias correction) applied before postprocessing steps.
See Also
Dataset : The returned dataset class create_validator : Method for creating parameter validators
Source code in climakitae/new_core/dataset_factory.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | |
register_catalog(key, catalog_url)
Register a data catalog with the factory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Identifier for the catalog. Should correspond to data_type, installation, or other distinguishing characteristics. |
required |
catalog_url
|
str
|
URL or path to the catalog to register for the given key. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If key is empty or None. |
Examples:
See Also
DataCatalog : Base catalog class
Source code in climakitae/new_core/dataset_factory.py
register_validator(key, validator_class)
Register a parameter validator with the factory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Identifier for the validator (approach, data_type combination) |
required |
validator_class
|
Type[ParameterValidator]
|
Validator class to register |
required |
Source code in climakitae/new_core/dataset_factory.py
register_processing_step(step_type, step_class)
Register a processing step with the factory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step_type
|
str
|
Identifier for the processing step |
required |
step_class
|
Type[DataProcessor]
|
Processing step class to register |
required |
Source code in climakitae/new_core/dataset_factory.py
create_validator(val_reg_key)
Create a parameter validator based on data_type and approach.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
val_reg_key
|
str
|
Key for the validator (data_type_approach) |
required |
Returns:
| Type | Description |
|---|---|
ParameterValidator or None
|
An appropriate parameter validator, or None if not found. |
Source code in climakitae/new_core/dataset_factory.py
get_catalog_options(key, query=None)
Get available options for a specific catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Key of the catalog to query. |
required |
query
|
dict
|
A dictionary to filter the catalog options. The keys of the dictionary should correspond to columns in the catalog, and the values are the values to filter by. |
None
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of available options for the specified catalog. |
Source code in climakitae/new_core/dataset_factory.py
get_validators()
Get a list of available validators.
Returns:
| Type | Description |
|---|---|
List[str]
|
List of available validators. |
get_valid_processors(catalog_key)
Get a list of valid processors for a specific catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
catalog_key
|
str
|
The catalog key to filter processors by (required). |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of processors valid for the specified catalog. |
Source code in climakitae/new_core/dataset_factory.py
get_stations()
Get a list of available station datasets.
Returns:
| Type | Description |
|---|---|
List[str]
|
List of available station datasets. |
Source code in climakitae/new_core/dataset_factory.py
get_boundaries(boundary_type)
Get a list of available boundary datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
boundary_type
|
str
|
The type of boundary datasets to retrieve. If the type is not found in the cache, returns all available boundary types. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of available boundary datasets for the specified type, or all available boundary types if the specified type is not found. |
Source code in climakitae/new_core/dataset_factory.py
reset()
Reset the factory state, clearing all registered catalogs, validators, and processors.
This method is useful for reinitializing the factory without creating a new instance.