Pipeline

Pipeline()

A machine learning pipeline that chains multiple processing steps together.

This class provides a xorq-native implementation that wraps scikit-learn pipelines, enabling deferred execution and integration with xorq expressions. The pipeline can contain both transform steps (data preprocessing) and a final prediction step.

Parameters

Name	Type	Description	Default
steps	tuple of Step	Sequence of Step objects that make up the pipeline.	required

Attributes

Name	Type	Description
steps	tuple of Step	The sequence of processing steps.
instance	sklearn.pipeline.Pipeline	The equivalent scikit-learn Pipeline instance.
transform_steps	tuple of Step	All steps except the final prediction step (if any).
predict_step	Step or None	The final step if it has a predict method, otherwise None.

Examples

Create a pipeline from scikit-learn estimators:

>>> from xorq.ml import Pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.neighbors import KNeighborsClassifier
>>> import sklearn.pipeline

>>> sklearn_pipeline = sklearn.pipeline.Pipeline([
...     ("scaler", StandardScaler()),
...     ("knn", KNeighborsClassifier(n_neighbors=5))
... ])
>>> xorq_pipeline = Pipeline.from_instance(sklearn_pipeline)

Fit and predict with xorq expressions:

>>> # Assuming train and test are xorq expressions
>>> fitted = xorq_pipeline.fit(train, features=("feature1", "feature2"), target="target")  # quartodoc: +SKIP
>>> predictions = fitted.predict(test)  # quartodoc: +SKIP

Update pipeline parameters:

>>> updated_pipeline = xorq_pipeline.set_params(knn__n_neighbors=10)

Notes

The Pipeline class is frozen (immutable) using attrs.
Pipelines automatically detect transform vs predict steps based on method availability.
The fit() method returns a FittedPipeline that can transform and predict on new data.
Parameter updates use sklearn’s parameter naming convention (step__parameter).

Methods

Name	Description
fit	Fit the pipeline to training data.
from_instance	Create a Pipeline from an existing scikit-learn Pipeline.

fit

fit(expr, features=None, target=None, storage=None)

Fit the pipeline to training data.

This method sequentially fits each step in the pipeline, using the output of each transform step as input to the next step.

Parameters

Name	Type	Description	Default
expr	Expr	The xorq expression containing training data.	required
features	tuple of str	Column names to use as features. If None, infers from expr columns excluding the target.	`None`
target	str	Target column name. Required if pipeline has a prediction step.	`None`
storage	Storage	Storage backend for caching fitted models.	`None`

Returns

Name	Type	Description
	FittedPipeline	A fitted pipeline that can transform and predict on new data.

Raises

Name	Type	Description
	ValueError	If target is not provided but pipeline has a prediction step.

Examples

>>> fitted = pipeline.fit(
...     train_data,
...     features=("sepal_length", "sepal_width"),
...     target="species"
... )  # quartodoc: +SKIP

from_instance

from_instance(instance, deep=False)

Create a Pipeline from an existing scikit-learn Pipeline.

Parameters

Name	Type	Description	Default
instance	sklearn.pipeline.Pipeline	A fitted or unfitted scikit-learn pipeline.	required

Returns

Name	Type	Description
	Pipeline	A new xorq Pipeline wrapping the scikit-learn pipeline.

Examples

>>> import sklearn.pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.svm import SVC

>>> sklearn_pipe = sklearn.pipeline.Pipeline([
...     ("scaler", StandardScaler()),
...     ("svc", SVC())
... ])
>>> xorq_pipe = Pipeline.from_instance(sklearn_pipe)