Tuning¶
Tuning is a core XDFlow workflow. The optional tuning install extra exists to keep the base dependency set light; it does not mean hyperparameter search is an add-on conceptually.
XDFlow tuning uses the same objects as the rest of the framework:
- a
Pipelinedefines the model architecture - a
CrossValidatordefines the evaluation policy - a
DataContainerkeeps targets, groups, and metadata attached to the data - a
Tunersearches over pipeline choices and nested transform parameters
The point is to avoid rewriting experiment loops. The tuner asks Optuna for a candidate configuration, applies it to a cloned pipeline, evaluates it through the validator, and records the score. Cross-validation still owns splitting, refitting, scoring, and prediction alignment.
For custom evaluation policies, prefer configuring the built-in validators first. If the split policy itself is new, see Writing Custom Cross-Validators.
Install model¶
Use the tuning extra when you want to run optimization:
pip install "xdflow[tuning]"
The extra installs Optuna and related tuning dependencies. The workflow remains first-class in XDFlow; only the optimizer dependency is optional.
MLflow tracking is separate:
pip install "xdflow[mlflow]"
Use use_mlflow=False when you want tuning without experiment tracking.
Search space structure¶
The search space mirrors the pipeline structure:
param_grid = {
"pipeline_name": {
"step_name": {
"parameter_name": search_space,
},
},
}
Supported search spaces are intentionally simple:
["a", "b", "c"]for categorical choices(1, 10)for integer ranges(0.0, 1.0)for float ranges("log", 1e-4, 10.0)for log-scaled float ranges
The tuner converts those definitions into Optuna suggestions and applies them with the same nested parameter mechanics used by composite transforms.
Pipeline and architecture search¶
Tuning is not limited to scalar estimator parameters. Because switches and optional steps are transforms, the tuner can search over architecture choices too.
SwitchTransform exposes a pipeline position with several possible implementations. During tuning, XDFlow treats the switch as dynamic and samples the active choice per trial.
from xdflow.composite import OptionalTransform
from xdflow.transforms.normalization import ZScoreTransform
maybe_zscore = OptionalTransform(
ZScoreTransform(per_dim="channel", use_fit=True),
use=True,
)
A tuning grid can then tune the branch choice and any parameters on the selected branch. This is how feature extraction choices, preprocessing toggles, and model families can stay inside one pipeline contract.
Caching during tuning¶
Tuning repeatedly evaluates related pipelines. XDFlow uses transform state metadata to avoid recomputing fold-invariant work when possible.
Before each trial, the tuner can split a pipeline into:
- a static prefix that has no tunable or stateful steps
- a dynamic suffix that contains tunable choices, stateful transforms, or predictors
The static prefix is fit/transformed once and reused as the trial input. The dynamic suffix is cloned and evaluated by the cross-validator. This is especially important for expensive deterministic preprocessing such as filtering or feature extraction.
If a step learns from samples or depends on the split boundary, it should be marked stateful so it is not cached across folds or trials incorrectly.
Final model and holdout score¶
After tuning, the common workflow is:
tuner.tune(...)tuner.get_best_pipeline()to inspect the selected configurationtuner.score_best_pipeline_on_holdout()if the validator has a holdout splittuner.finalize_best_pipeline()to fit the selected pipeline on the final training data
This keeps optimization, validation, holdout scoring, and final fitting aligned with the same data contract.
Where tuning fits¶
Use cross-validation directly when you already know the pipeline configuration and want an evaluation score.
Use tuning when you want XDFlow to search over parameters, optional steps, switch branches, or multiple named pipelines while preserving the same split and refit rules.
In practice, tuning is the layer that turns XDFlow's composable pipelines into repeatable experiment search.