Core Concepts¶
xdflow is intentionally close to scikit-learn in workflow, but it changes the unit of computation from an unlabeled ndarray to a labeled xarray.DataArray wrapped by DataContainer.
Those labels are used by the runtime. Dimensions, coordinates, targets, groups, split settings, and transform state tell validators and tuners how to split, refit, cache, score, and keep predictions aligned.
Data model¶
- The canonical input is
xarray.DataArray. - Dimensions are named and preserved through the pipeline.
- Coordinates carry metadata such as class labels, sessions, animals, or timestamps.
DataContainerwraps the array and keeps operations returning containers instead of bare arrays.
The runtime expects at least a trial dimension for most supervised workflows. Additional dimensions such as channel, time, frequency, or feature are transform-specific.
Transform model¶
Every processing step inherits from Transform.
fitlearns state when needed.transformreturns a newDataContainer.fit_transformcombines both operations.predict/predict_probaare added byPredictorsubclasses.
Important conventions enforced throughout the library:
- transforms should operate on named dimensions, not positional axes
- incoming data should be treated as immutable
- learned state belongs on fitted attributes, not in constructor arguments
- new dimensions should receive descriptive names and coordinates where possible
See Writing Custom Transforms for concrete authoring patterns, dimension checks, and test examples.
Fold-invariant vs stateful steps¶
Cross-validation uses the is_stateful flag to decide where the fold boundary belongs:
- fold-invariant stateless transforms can run once before fold splitting
- stateful or split-dependent transforms are re-fit inside each fold
This matters for expensive preprocessing such as spectral feature extraction, where fold-invariant work should not be recomputed for every fold. A transform should only be marked stateless for CV reuse when it does not learn from held-out samples or depend on the split boundary.
See Execution Model for the cross-validation boundary, tuning cache boundary, and current in-memory data scope.
Composition patterns¶
The main composition tools are:
Pipeline: sequential processingTransformUnion: parallel branches that are concatenatedGroupApplyTransform: independent fitting per metadata-defined groupSwitchTransformandOptionalTransform: conditional or toggleable branchesEnsemblePredictor: weighted combination of multiple predictors
See Composing Pipelines for examples and guidance on when to use each composite.
Tuning¶
Tuning is a first-class XDFlow workflow. A tuner searches over pipeline parameters and architecture choices while the validator still owns splitting, scoring, refitting, and leakage boundaries.
The tuning install extra only controls dependency weight by installing Optuna. It does not make tuning a secondary part of the framework. See Tuning for the workflow model and Hyperparameter Tuning for a runnable pattern.
Target handling¶
Predictors usually read labels from coordinates rather than from a separate y array. This keeps targets aligned with the same labeled trial axis used by the rest of the pipeline.
For details on required dimensions, coordinates, selection semantics, and immutability guarantees, see Data Contract.