5-Minute Core Quickstart¶

This quickstart uses only the base xdflow install. It creates a small labeled xarray.DataArray, wraps it in a DataContainer, builds a pipeline, and lets KFoldValidator run the evaluation loop.

The example shows the mechanics that matter in larger experiments: labels live as coordinates, the classifier reads its target from a coordinate, the validator stratifies by that coordinate, stateful steps refit per fold, and predictions stay aligned with the trial axis.

From a repository checkout, the same example is available as a runnable script:

python examples/quickstart.py

1. Create Structured Data¶

xdflow starts from named dimensions and sample-level coordinates. Here each trial has channel and time dimensions, plus stimulus and session metadata.

import numpy as np
import xarray as xr

from xdflow.core import DataContainer

rng = np.random.default_rng(0)

stimuli = np.repeat(["rest", "tone", "odor"], 60)
rng.shuffle(stimuli)

values = rng.normal(0.0, 0.8, size=(stimuli.size, 4, 25))
values[stimuli == "tone", 1, 8:15] += 2.0
values[stimuli == "odor", 2, 14:22] += 2.0

trial_ids = np.arange(stimuli.size)

data = xr.DataArray(
    values,
    dims=("trial", "channel", "time"),
    coords={
        "trial": trial_ids,
        "channel": [f"ch{i}" for i in range(4)],
        "time": np.linspace(-0.2, 0.8, 25),
        "stimulus": ("trial", stimuli),
        "session": ("trial", np.where(trial_ids < stimuli.size // 2, "session_a", "session_b")),
    },
)
container = DataContainer(data)

2. Build A Pipeline¶

The pipeline keeps labels and coordinates attached while each step transforms the data. The classifier reads its targets from the stimulus coordinate.

from sklearn.linear_model import LogisticRegression

from xdflow.composite import Pipeline
from xdflow.transforms.basic_transforms import FlattenTransform
from xdflow.transforms.normalization import ZScoreTransform
from xdflow.transforms.sklearn_transform import SKLearnPredictor

pipeline = Pipeline(
    name="core_quickstart",
    steps=[
        ("zscore", ZScoreTransform(per_dim="trial")),
        ("flatten", FlattenTransform(dims=("channel", "time"))),
        (
            "classifier",
            SKLearnPredictor(
                LogisticRegression,
                sample_dim="trial",
                target_coord="stimulus",
                max_iter=500,
            ),
        ),
    ],
)

3. Cross-Validate¶

KFoldValidator owns the split loop, scoring, prediction collection, and stateful refits. The stratify_coord argument keeps class proportions balanced across folds using the named stimulus coordinate.

In this pipeline, z-scoring is per trial and flattening is structural, so that fold-invariant preprocessing can run before the stateful classifier is cloned and refit on each training fold.

from xdflow.cv import KFoldValidator

cv = KFoldValidator(
    n_splits=5,
    shuffle=True,
    random_state=0,
    stratify_coord="stimulus",
    scoring="f1_weighted",
    verbose=False,
)
cv.set_pipeline(pipeline)

score = cv.cross_validate(container, verbose=False)
print(f"Weighted F1: {score:.3f}")

4. Fit And Predict¶

After choosing a pipeline, fit it on the data you want to use for the final model and call predict.

pipeline.fit(container)
predictions = pipeline.predict(container)

print(predictions.data.dims)

The prediction container still carries the sample dimension, so predictions can be aligned back to trial-level metadata.

Next Steps¶

Read Data Contract before adapting your own arrays.
Use Writing Custom Transforms when your preprocessing should become a reusable pipeline step.
Use Hyperparameter Tuning to search over the same kind of pipeline with Optuna.
Use Spectral Pipeline Walkthrough for a richer signal-processing example.
Use Reusable ML Patterns for multilabel, sample-weighting, and domain-transfer workflows.