Composition API¶
Composition APIs define how transforms are combined while keeping the pipeline visible to validators and tuners. Use them for sequential pipelines, branching, per-group fitting, optional steps, and ensembles when those choices should participate in split, refit, and cache planning.
Base Composition Types¶
TransformStep
dataclass
¶
TransformStep(name: str, transform: Transform)
Represents a specifically named step in a processing pipeline.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the step. |
transform |
Transform
|
The transform object to be executed in this step. |
CompositeTransform ¶
CompositeTransform(sel: dict[str, Any] | None = None, drop_sel: dict[str, Any] | None = None, transform_sel: dict | None = None, transform_drop_sel: dict | None = None, **kwargs)
Bases: Transform, ABC
Abstract base class for transforms that are compositions of other transforms.
This class provides common functionality for orchestrators like Pipeline and PipelineUnion, such as dynamically determining statefulness based on its constituent children.
Cloning semantics¶
- CompositeTransform.clone() performs a constructor-filtered recursive clone: it reconstructs a new instance using only parameters present in the subclass init signature and, for any child Transform(s), calls child.clone().
- "Recursive" means we clone through the transform hierarchy, but do not copy fitted state. Each child must keep fitted state out of init so the cloned composite is unfitted.
- Subclasses should ensure that child collections (e.g., self.steps) are set before super().init so is_stateful can be computed from children.
Initializes the CompositeTransform.
The is_stateful attribute is automatically determined by inspecting
the children defined in the concrete subclass. This requires that child
collections (e.g., self.steps) are initialized in the subclass's
__init__ method before calling super().__init__().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sel
|
dict[str, Any] | None
|
Dictionary of coordinates to select. |
None
|
drop_sel
|
dict[str, Any] | None
|
Dictionary of coordinates to drop. |
None
|
transform_sel
|
dict | None
|
Dictionary of coordinates to select for fitting/transforming. |
None
|
transform_drop_sel
|
dict | None
|
Dictionary of coordinates to drop for fitting/transforming. |
None
|
**kwargs
|
Additional keyword arguments. |
{}
|
Source code in xdflow/composite/base.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
children
abstractmethod
property
¶
children: Iterable[Transform]
is_predictor
abstractmethod
property
¶
is_predictor: bool
Returns True if the transform performs prediction.
predictive_transform
property
¶
predictive_transform: Transform | None
Returns the predictive transform if it exists, otherwise None. Must be implemented by subclasses if the subclass is a predictor.
predict ¶
predict(container: DataContainer, **kwargs) -> DataContainer
Predicts on data. Must be implemented by subclasses if the subclass is a predictor but does not inherit from Predictor.
Source code in xdflow/composite/base.py
150 151 152 153 154 155 156 157 158 159 160 | |
predict_proba ¶
predict_proba(container: DataContainer, **kwargs) -> DataContainer
Predicts the probabilities on data. Must be implemented by subclasses if the subclass is a predictor but does not inherit from Predictor.
Source code in xdflow/composite/base.py
162 163 164 165 166 167 168 169 170 171 172 | |
set_params ¶
set_params(**params: Any) -> CompositeTransform
Set the parameters of this transform and its children.
This method supports nested parameter setting using the __ separator,
similar to scikit-learn's Pipeline.
Source code in xdflow/composite/base.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 | |
clone ¶
clone() -> Self
Return a fresh unfitted instance by recursively cloning constructor-filtered params.
This default implementation mirrors Transform.clone but recursively clones any values
that are Transforms (or collections of them), ensuring children are cloned
without copying fitted state. Only public constructor parameters are passed
to the new instance.
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
A new, unfitted instance with cloned child transforms. |
Source code in xdflow/composite/base.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 | |
get_transform_from_name ¶
get_transform_from_name(name: str) -> Transform
Returns the step with the given name.
Source code in xdflow/composite/base.py
286 287 288 | |
Pipelines¶
Pipeline ¶
Pipeline(name: str, steps: list[tuple[str, Transform]] | list[TransformStep], expected_input_dims: dict[str, tuple[str, ...]] = None, use_cache: bool = False)
Bases: CompositeTransform
Run named transforms in sequence.
A pipeline is itself a transform, so it can be nested inside other
composites or passed directly to CrossValidator. Each step receives the
DataContainer produced by the previous step. fit_transform fits stateful
steps as the data flows forward; transform assumes stateful steps have
already been fitted.
If the final step is a Predictor, the pipeline also exposes predict,
predict_proba, and get_labels. In that case all steps before the final
predictor are applied first, then prediction is delegated to the predictor.
Step names must be unique. Optional expected_input_dims can be used to
validate the dimensions seen by each step at runtime.
Create a named pipeline from transform steps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Human-readable pipeline name. |
required |
steps
|
list[tuple[str, Transform]] | list[TransformStep]
|
Ordered |
required |
expected_input_dims
|
dict[str, tuple[str, ...]]
|
Optional mapping from each step name to the dimensions expected immediately before that step runs. |
None
|
use_cache
|
bool
|
Whether to cache |
False
|
Source code in xdflow/composite/pipeline.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
predictive_transform
property
¶
predictive_transform: Transform | None
Returns the predictive transform if it exists, otherwise None.
final_target_coord
property
¶
final_target_coord: str | None
Convenience: expose the final predictor's target coordinate, if any.
Returns:
| Type | Description |
|---|---|
str | None
|
The |
fit ¶
fit(container: DataContainer, **kwargs) -> Pipeline
Fits all stateful transforms in the pipeline using recursive delegation.
This method fits all the transformers in sequence. The data is transformed by each step and passed to the next. The final transformed data is discarded. The primary purpose is to prepare the pipeline for future transform() calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to fit on |
required |
**kwargs
|
Additional context/parameters passed through the pipeline |
{}
|
Returns:
| Type | Description |
|---|---|
Pipeline
|
Self (fitted pipeline) |
Source code in xdflow/composite/pipeline.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
fit_transform ¶
fit_transform(container: DataContainer, **kwargs) -> DataContainer
Fits and transforms the data in a single, efficient pass.
This is the preferred method when you need to both fit the pipeline and get the transformed training data back. It performs the exact same fitting logic as fit() but returns the final transformed result instead of discarding it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to fit and transform. |
required |
**kwargs
|
Additional context/parameters passed through the pipeline. |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
The transformed DataContainer. |
Source code in xdflow/composite/pipeline.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | |
get_expected_output_dims ¶
get_expected_output_dims(input_dims: tuple[str, ...], print_steps: bool = False) -> tuple[str, ...]
Returns the expected output dimensions for the pipeline.
Source code in xdflow/composite/pipeline.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 | |
predict ¶
predict(container: DataContainer, **kwargs) -> DataContainer
Generates predictions using the final predictor in the pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to make predictions on |
required |
**kwargs
|
Additional context/parameters passed through the pipeline |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
DataContainer with predictions as the primary data |
Source code in xdflow/composite/pipeline.py
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 | |
predict_proba ¶
predict_proba(container: DataContainer, **kwargs) -> DataContainer
Generates prediction probabilities using the final Predictor in the pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to make predictions on |
required |
**kwargs
|
Additional context/parameters passed through the pipeline |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
DataContainer with prediction probabilities |
Source code in xdflow/composite/pipeline.py
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 | |
prepare_for_inference ¶
prepare_for_inference(*, set_n_jobs_single: bool = True) -> None
Disable training-time optimizations that are undesirable at inference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
set_n_jobs_single
|
bool
|
When True, force transforms that expose an |
True
|
Source code in xdflow/composite/pipeline.py
302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 | |
get_labels ¶
get_labels() -> list[Any]
Return the label ordering from the final predictor.
Relies on the predictor implementing get_labels; raises when the pipeline
cannot provide labels unambiguously.
Source code in xdflow/composite/pipeline.py
341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | |
Grouped Application¶
GroupApplyTransform ¶
GroupApplyTransform(group_coord: str | list[str], transform_template: Transform, unseen_policy: Literal['error', 'average', 'weighted_average'] = 'error', unequal_output_dims_strategy: Literal['error', 'cut_to_min'] = 'error', n_jobs: int = 1)
Bases: CompositeTransform
Applies a transform individually to each group defined by a metadata coordinate.
This transform discovers groups from the data at fit time, creates independent transform instances per group by cloning the template, and applies transformations per group. The outputs are reassembled along the original grouped axis.
Use cases: - Apply per-animal preprocessing where each animal needs independent fitting - Train separate models per session or experimental condition - Any scenario where groups should be processed independently
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_coord
|
str | list[str]
|
Coordinate name to use for grouping (e.g., "animal", "session") |
required |
transform
|
Template transform to clone per group (unfitted) |
required | |
unseen_policy
|
Literal['error', 'average', 'weighted_average']
|
How to handle groups not seen during fit: - "error": raise TransformError (default) - "average": uniform average across all fitted group transforms - "weighted_average": weighted average by training counts per group |
'error'
|
unequal_output_dims_strategy
|
Literal['error', 'cut_to_min']
|
How to handle unequal (non-group) output dimensions across groups: (unequal output dims lead to NaNs during concatenation) - "error": raise TransformError (default) - "cut_to_min": use the min size per dimension across groups |
'error'
|
n_jobs
|
int
|
Number of parallel jobs for per-group processing |
1
|
Initialize GroupApplyTransform with grouping parameters.
Source code in xdflow/composite/group_apply.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
is_predictor
property
¶
is_predictor: bool
Returns True if the template transform performs prediction.
predictive_transform
property
¶
predictive_transform: Transform | None
Returns the predictive transform if it exists, otherwise None.
fit ¶
fit(container: DataContainer, **kwargs) -> GroupApplyTransform
Fits per-group transforms after discovering groups from the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to fit on |
required |
**kwargs
|
Additional context/parameters passed through |
{}
|
Returns:
| Type | Description |
|---|---|
GroupApplyTransform
|
Self (fitted GroupApplyTransform) |
Source code in xdflow/composite/group_apply.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 | |
fit_transform ¶
fit_transform(container: DataContainer, **kwargs) -> DataContainer
Fits and transforms in a single pass for efficiency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to fit and transform |
required |
**kwargs
|
Additional context/parameters passed through |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
Transformed DataContainer with results reassembled |
Source code in xdflow/composite/group_apply.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 | |
predict ¶
predict(container: DataContainer, **kwargs) -> DataContainer
Generates predictions using per-group fitted predictors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to make predictions on |
required |
**kwargs
|
Additional context/parameters |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
DataContainer with predictions |
Source code in xdflow/composite/group_apply.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 | |
predict_proba ¶
predict_proba(container: DataContainer, **kwargs) -> DataContainer
Generates prediction probabilities using per-group fitted predictors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to make predictions on |
required |
**kwargs
|
Additional context/parameters |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
DataContainer with prediction probabilities |
Source code in xdflow/composite/group_apply.py
560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 | |
get_expected_output_dims ¶
get_expected_output_dims(input_dims: tuple[str, ...]) -> tuple[str, ...]
Returns the expected output dimensions for the GroupApplyTransform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dims
|
tuple[str, ...]
|
Expected input dimensions |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, ...]
|
Expected output dimensions from the reference transform |
Source code in xdflow/composite/group_apply.py
651 652 653 654 655 656 657 658 659 660 661 662 | |
Parallel Branches¶
TransformUnion ¶
TransformUnion(transforms_list: list[tuple[str, Transform] | Pipeline | TransformStep], from_dims: list[str] | None = None, to_dim: str | None = 'feature', n_jobs: int = 1)
Bases: CompositeTransform
Applies a set of transforms in parallel and concatenates their outputs.
This is a special Transform that applies multiple transforms in parallel to the same input data and concatenates their results. This is useful for combining different types of features (e.g., spectral and temporal) into a single feature set.
Note: This class computes is_stateful dynamically based on constituent transforms, so it overrides the class attribute with an instance attribute.
Uses: TransformUnion(transforms_list=[ Pipeline(name="time_average", steps=[("average_time", AverageTransform(dims="time"))]), ("average_channel", AverageTransform(dims="channel")), ])
Initialize TransformUnion with multiple transforms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
transforms_list
|
list[tuple[str, Transform] | Pipeline | TransformStep]
|
List of (step_name, transform) tuples, Pipeline objects, or TransformStep objects |
required |
n_jobs
|
int
|
Number of parallel jobs to run. - n_jobs=1 (default): Sequential execution, maintains current behavior - n_jobs=-1: Use all available CPU cores - n_jobs>1: Use the specified number of worker processes |
1
|
Source code in xdflow/composite/transform_union.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
is_predictor
property
¶
is_predictor: bool
Returns False for TransformUnion because it concatenates outputs and does not perform prediction.
fit_transform ¶
fit_transform(container: DataContainer, **kwargs) -> DataContainer
Fits and transforms the data in a single, efficient parallel pass.
This method avoids double computation by running fit_transform on each
child transform and collecting both the fitted transformer and the
transformed data from each worker process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to fit and transform |
required |
**kwargs
|
Additional context/parameters passed through the pipeline |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
DataContainer with concatenated results along 'feature' dimension |
Source code in xdflow/composite/transform_union.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
fit ¶
fit(container: DataContainer, **kwargs) -> TransformUnion
Fits all stateful steps in parallel or sequentially.
For parallel execution, fitted transforms are returned from worker processes and used to update the original transform objects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to fit on |
required |
**kwargs
|
Additional context/parameters passed through the pipeline |
{}
|
Returns:
| Type | Description |
|---|---|
TransformUnion
|
Self (fitted TransformUnion) |
Source code in xdflow/composite/transform_union.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
get_expected_output_dims ¶
get_expected_output_dims(input_dims: tuple[str, ...]) -> tuple[str, ...]
Returns the expected output dimensions for the PipelineUnion.
Source code in xdflow/composite/transform_union.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 | |
UnionWithInput ¶
UnionWithInput(transform_template: Transform, join_dim: str, to_dim: str | None = None, n_jobs: int = 1, name: str | None = None)
Bases: TransformUnion
Concatenates a transform's output with the original input along a join dimension.
This is a convenience wrapper around TransformUnion that forms a two-branch
union consisting of the provided transform and an identity branch. It is
equivalent to:
TransformUnion(
transforms_list=[("transform", transform), ("identity", IdentityTransform())],
from_dims=[join_dim, join_dim],
to_dim=to_dim or join_dim,
n_jobs=n_jobs,
)
Typical usage is to augment feature channels by concatenating the transform's
output with the original input along channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
transform
|
The transform or pipeline to apply in the non-identity branch. |
required | |
join_dim
|
str
|
The dimension name along which to concatenate both branches. |
required |
to_dim
|
str | None
|
Optional name for the resulting join dimension. Defaults to |
None
|
n_jobs
|
int
|
Parallelism parameter passed through to |
1
|
name
|
str | None
|
Optional explicit name to assign to the transform branch. |
None
|
Source code in xdflow/composite/transform_union.py
385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 | |
Conditional Branches¶
SwitchTransform ¶
SwitchTransform(choices: list[tuple[str, Transform] | TransformStep | Pipeline] | dict[str, Transform], choose: str | None = None, from_dim: str | None = None, to_dim: str | None = None)
Bases: CompositeTransform
A conditional transform that selects one of several child transforms to execute.
This acts as a placeholder in a pipeline for a step that has multiple
possible implementations. The choice of which transform to run is determined
at runtime by the choose keyword argument passed to fit or transform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
choices
|
list[tuple[str, Transform] | TransformStep | Pipeline] | dict[str, Transform]
|
Preferred style is a list of |
required |
choose
|
str | None
|
Optional explicit selection for the switch. If provided, it must match one
of the choice names. If not provided, the user must supply |
None
|
Initialize SwitchTransform with multiple choice transforms.
Source code in xdflow/composite/switch_transform.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | |
is_predictor
property
¶
is_predictor: bool
Returns True if the selected transform performs prediction.
predictive_transform
property
¶
predictive_transform: Transform | None
Returns the predictive transform if it exists, otherwise None.
predict ¶
predict(container: DataContainer, **kwargs) -> DataContainer
Predicts the data using the selected child transform.
Source code in xdflow/composite/switch_transform.py
131 132 133 134 135 136 137 138 139 140 | |
predict_proba ¶
predict_proba(container: DataContainer, **kwargs) -> DataContainer
Predicts the probabilities using the selected child transform.
Source code in xdflow/composite/switch_transform.py
142 143 144 145 146 147 148 149 150 151 | |
fit ¶
fit(container: DataContainer, **kwargs) -> SwitchTransform
Fits the selected child transform.
Source code in xdflow/composite/switch_transform.py
153 154 155 156 157 | |
fit_transform ¶
fit_transform(container: DataContainer, **kwargs) -> DataContainer
Fit/transform by delegating to the selected child.
If the selected child is stateful, call its fit_transform; otherwise, call its transform. This allows mixing stateful and stateless choices without requiring the switch wrapper itself to implement _fit.
Source code in xdflow/composite/switch_transform.py
159 160 161 162 163 164 165 166 167 168 169 170 171 | |
get_expected_output_dims ¶
get_expected_output_dims(input_dims: tuple[str, ...]) -> tuple[str, ...]
Determines the expected output dimensions.
For consistency, this implementation requires that all possible choices produce the same output dimensions for a given input. It validates this by checking the first choice and then asserting all others match.
Source code in xdflow/composite/switch_transform.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
OptionalTransform ¶
OptionalTransform(transform_template: Transform, choose: str | None = None, use: bool | None = None, name: str | None = None, skip_name: str = 'identity', identity_rename: dict[str, str] | None = None)
Bases: SwitchTransform
Optionally apply a transform or skip it entirely (identity behavior).
This is a convenience wrapper over SwitchTransform that defines two choices:
- "use": apply the provided transform
- "skip": apply IdentityTransform (no-op)
You can control selection by either:
- use=True|False boolean, or
- choose set to either "use" or "skip".
Note
For this to be valid within a statically validated pipeline, the wrapped transform should preserve the dimension signature. Otherwise, the two choices would yield different output dims and violate the validation requirement that choices share the same output dims.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
transform_template
|
Transform
|
The transform or pipeline to optionally apply. |
required |
choose
|
str | None
|
Optional explicit selection ("use" or "skip"). |
None
|
use
|
bool | None
|
Optional boolean shorthand for |
None
|
Initialize an OptionalTransform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
transform_template
|
Transform
|
The wrapped transform/pipeline to optionally apply. |
required |
choose
|
str | None
|
Explicit choice label to select the branch. If provided, must
be either the transform branch name or |
None
|
use
|
bool | None
|
Boolean shorthand; if provided, maps to |
None
|
name
|
str | None
|
Optional label for the transform branch. Defaults to the
lowercased class name of |
None
|
skip_name
|
str
|
Label for the identity branch. Defaults to "identity". |
'identity'
|
identity_rename
|
dict[str, str] | None
|
Optional mapping of coordinate names to rename ONLY when the identity branch is selected, e.g., {"old_coord": "new_coord"}. This renames coordinates without altering dimension names. |
None
|
Source code in xdflow/composite/switch_transform.py
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | |
Ensembles¶
EnsembleMember
dataclass
¶
EnsembleMember(name: str, transform: Transform, weight: float = 1.0)
Represents a member of an ensemble with its name, predictor, and weight.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the ensemble member. |
transform |
Transform
|
The transform object. |
weight |
float
|
The weight for this member in the ensemble. |
predictor
property
¶
predictor: Predictor
Return the predictive component for this member's transform.
EnsemblePredictor ¶
EnsemblePredictor(members: list[tuple[str, Predictor] | EnsembleMember | TransformStep | Transform], sample_dim: str, target_coord: str, encoder: LabelEncoder | None = None, weights: list[float] | None = None, weighting_strategy: Literal['uniform', 'score_based', 'custom'] = 'uniform', scoring_func: Callable = accuracy_score, scoring_transform_func: Callable[[float], float] | None = None, normalize_weights: bool = True, normalize_outputs: bool = True, n_jobs: int = 1, calibration_container: DataContainer | None = None, proba: bool = False, sel: dict | None = None, drop_sel: dict | None = None, **kwargs)
Bases: CompositeTransform, Predictor
An ensemble predictor that combines multiple predictors using weighted averaging.
This predictor applies multiple child predictors to the same input and combines their outputs using weighted averaging. It supports various weighting strategies for combining predictor outputs.
Features: - Multiple weighting strategies (uniform, score-based, custom) - Parallel execution support - Score-based weighting with customizable scoring functions - Proper validation and error handling - Both prediction and probability prediction ensemble
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
members
|
list[tuple[str, Predictor] | EnsembleMember | TransformStep | Transform]
|
List of (name, predictor) tuples, EnsembleMember objects, or TransformStep objects |
required |
sample_dim
|
str
|
Name of the sample dimension |
required |
target_coord
|
str
|
Name of the target coordinate |
required |
encoder
|
LabelEncoder | None
|
Optional label encoder for the predictor |
None
|
weights
|
list[float] | None
|
Optional explicit weights for the members (overrides weighting_strategy) |
None
|
weighting_strategy
|
Literal['uniform', 'score_based', 'custom']
|
Strategy for determining weights ('uniform', 'score_based', 'custom') |
'uniform'
|
scoring_func
|
Callable
|
Function to use for score-based weighting (default: accuracy_score) |
accuracy_score
|
scoring_transform_func
|
Callable[[float], float] | None
|
Function to transform scores before using as weights |
None
|
normalize_weights
|
bool
|
Whether to normalize weights to sum to 1 |
True
|
normalize_outputs
|
bool
|
Whether to normalize final ensemble outputs |
True
|
n_jobs
|
int
|
Number of parallel jobs for execution |
1
|
calibration_container
|
DataContainer | None
|
Optional container for score-based weighting calibration |
None
|
Initialize EnsemblePredictor with ensemble members and configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
members
|
list[tuple[str, Predictor] | EnsembleMember | TransformStep | Transform]
|
List of ensemble members in various formats |
required |
sample_dim
|
str
|
Name of the sample dimension |
required |
target_coord
|
str
|
Name of the target coordinate |
required |
encoder
|
LabelEncoder | None
|
Optional label encoder for the predictor |
None
|
weights
|
list[float] | None
|
Optional explicit weights (overrides weighting_strategy) |
None
|
weighting_strategy
|
Literal['uniform', 'score_based', 'custom']
|
How to determine member weights |
'uniform'
|
scoring_func
|
Callable
|
Function for score-based weighting evaluation |
accuracy_score
|
scoring_transform_func
|
Callable[[float], float] | None
|
Transform function applied to scores (defaults to identity function) |
None
|
normalize_weights
|
bool
|
Whether to normalize weights to sum to 1 |
True
|
normalize_outputs
|
bool
|
Whether to normalize final outputs |
True
|
n_jobs
|
int
|
Number of parallel jobs to use |
1
|
calibration_container
|
DataContainer | None
|
Data for score-based weight calibration |
None
|
proba
|
bool
|
Whether to return probabilities by default |
False
|
sel
|
dict | None
|
Optional selection to apply before predicting |
None
|
drop_sel
|
dict | None
|
Optional drop selection to apply before predicting |
None
|
Source code in xdflow/composite/ensemble.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | |
children
property
¶
children: list[Transform]
Returns the transform objects from the ensemble members.
check_is_fitted ¶
check_is_fitted() -> bool
Checks if the ensemble is fitted if all members are fitted.
Source code in xdflow/composite/ensemble.py
313 314 315 316 317 318 | |
prepare_for_inference ¶
prepare_for_inference(*, set_n_jobs_single: bool = True) -> None
Disable training-time options that slow down per-request inference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
set_n_jobs_single
|
bool
|
When True, force single-threaded execution for members. |
True
|
Source code in xdflow/composite/ensemble.py
385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 | |
fit_transform ¶
fit_transform(container: DataContainer, **kwargs) -> DataContainer
Fits and transforms all ensemble members.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to fit on |
required |
**kwargs
|
Additional context/parameters passed through |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
Self (fitted ensemble) |
Source code in xdflow/composite/ensemble.py
538 539 540 541 542 543 544 545 546 547 548 549 550 | |
transform ¶
transform(container: DataContainer, **kwargs) -> DataContainer
Transforms the data using all ensemble members.
Source code in xdflow/composite/ensemble.py
552 553 554 555 556 557 558 559 560 561 | |
predict ¶
predict(container: DataContainer, **kwargs) -> DataContainer
Predict labels using ensemble, leveraging shared encoding optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to predict on |
required |
**kwargs
|
Additional context/parameters passed through |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
DataContainer with predictions |
Source code in xdflow/composite/ensemble.py
601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 | |
predict_proba ¶
predict_proba(container: DataContainer, **kwargs) -> DataContainer
Predict class probabilities using ensemble, leveraging shared encoding optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to predict on |
required |
**kwargs
|
Additional context/parameters passed through |
{}
|
Returns:
| Type | Description |
|---|---|
DataContainer
|
DataContainer with class probabilities |
Source code in xdflow/composite/ensemble.py
676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 | |
predict_proba_with_uncertainty_components ¶
predict_proba_with_uncertainty_components(container: DataContainer, **kwargs) -> tuple[DataContainer, DataContainer, DataContainer]
Predict class probabilities and entropy-based aleatoric/epistemic uncertainty components.
The two components are:
A = E_w[H(p_i)] (aleatoric)
B = H(E_w[p_i]) - E_w[H(p_i)] (epistemic)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to predict on. |
required |
**kwargs
|
Additional context/parameters passed through to member predictors. |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[DataContainer, DataContainer, DataContainer]
|
Tuple of: - DataContainer with class probabilities (same as predict_proba) - DataContainer with aleatoric uncertainty (A), one score per sample - DataContainer with epistemic uncertainty (B), one score per sample |
Source code in xdflow/composite/ensemble.py
705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 | |
predict_proba_with_std ¶
predict_proba_with_std(container: DataContainer, *, return_stderr: bool = False, **kwargs) -> tuple[DataContainer, DataContainer]
Predict class probabilities along with the standard deviation or standard error across ensemble members.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
DataContainer
|
DataContainer to predict on. |
required |
return_stderr
|
bool
|
When True, return standard error instead of standard deviation. |
False
|
**kwargs
|
Additional context/parameters passed through to member predictors. |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[DataContainer, DataContainer]
|
Tuple of: - DataContainer with class probabilities (same as predict_proba) - DataContainer with standard deviation (or standard error) per sample/class |
Source code in xdflow/composite/ensemble.py
821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 | |