Dataset Pipeline
Data processing pipeline execution.
A pipeline-based data processing class for climate data workflows.
The Dataset class serves as a central orchestrator that coordinates data access, parameter validation, and sequential processing steps. It implements a fluent interface pattern allowing method chaining for building complex data workflows.
Attributes:
| Name | Type | Description |
|---|---|---|
data_access |
DataCatalog or UNSET
|
The data catalog instance used for retrieving raw data from various sources. |
parameter_validator |
ParameterValidator or UNSET
|
The parameter validator instance used for validating query parameters. |
processing_pipeline |
list of DataProcessor or UNSET
|
A list of processing steps to be executed sequentially on the data. |
Methods:
| Name | Description |
|---|---|
execute |
Execute the complete data processing pipeline and return the result. |
with_param_validator |
Set the parameter validator for the dataset (method chaining). |
with_catalog |
Set the data catalog for the dataset (method chaining). |
with_processing_step |
Add a processing step to the pipeline (method chaining). |
Raises:
| Type | Description |
|---|---|
TypeError
|
If provided components don't match expected types. |
AttributeError
|
If provided components lack required methods. |
ValueError
|
If required components are missing during execution. |
RuntimeError
|
If the processing pipeline encounters execution errors. |
Notes
- Processing steps are executed in the order they are added to the pipeline
- The context dictionary is passed through all processing steps and may be modified
- Steps that require data access can set
needs_catalog = Trueto receive the data accessor - Validation failures return an empty xarray.Dataset rather than raising exceptions
- All components (validator, catalog, processors) must implement their respective interfaces
See Also
DataCatalog : Interface for data access components ParameterValidator : Interface for parameter validation components DataProcessor : Interface for data processing components
Initialize the Dataset class.
Attributes:
| Name | Type | Description |
|---|---|---|
data_access |
DataCatalog or UNSET
|
The data catalog instance used for retrieving raw data from various sources. |
parameter_validator |
ParameterValidator or UNSET
|
The parameter validator instance used for validating query parameters. |
processing_pipeline |
list of DataProcessor or UNSET
|
A list of processing steps to be executed sequentially on the data. |
Source code in climakitae/new_core/dataset.py
execute(parameters=UNSET)
Execute the dataset processing pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parameters
|
Dict[str, Any]
|
Parameters to pass to the processing pipeline |
UNSET
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Result of the processing pipeline |
Source code in climakitae/new_core/dataset.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
with_param_validator(parameter_validator)
Set a new parameter validator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parameter_validator
|
ParameterValidator
|
Parameter validator to set for the dataset. |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
The current instance of Dataset allowing method chaining. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the parameter validator is not an instance of ParameterValidator. |
Source code in climakitae/new_core/dataset.py
with_catalog(catalog)
Set a new data catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
catalog
|
DataCatalog
|
Data catalog to set for the dataset. |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
The current instance of Dataset allowing method chaining. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the catalog is not an instance of DataCatalog. |
AttributeError
|
If the catalog does not have a 'get_data' method. |
TypeError
|
If the 'get_data' method is not callable. |
Source code in climakitae/new_core/dataset.py
with_processing_step(step)
Add a new processing step to the pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step
|
DataProcessor
|
Processing step to add to the pipeline. Must have 'execute' and 'update_context' methods. |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
The current instance of Dataset allowing method chaining. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the step is not an instance of DataProcessor. |
AttributeError
|
If the step does not have 'execute', 'update_context', or 'set_data_accessor' methods. |