Pipeline

Pipeline()

A machine learning pipeline that chains multiple processing steps together.

This class provides a xorq-native implementation that wraps scikit-learn pipelines, enabling deferred execution and integration with xorq expressions. The pipeline can contain both transform steps (data preprocessing) and a final prediction step.

Parameters

Name Type Description Default
steps tuple of Step Sequence of Step objects that make up the pipeline. required

Attributes

Name Type Description
steps tuple of Step The sequence of processing steps.
instance sklearn.pipeline.Pipeline The equivalent scikit-learn Pipeline instance.
transform_steps tuple of Step All steps except the final prediction step (if any).
predict_step Step or None The final step if it has a predict method, otherwise None.

Examples

Create a pipeline from scikit-learn estimators:

>>> from xorq.ml import Pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.neighbors import KNeighborsClassifier
>>> import sklearn.pipeline
>>> sklearn_pipeline = sklearn.pipeline.Pipeline([
...     ("scaler", StandardScaler()),
...     ("knn", KNeighborsClassifier(n_neighbors=5))
... ])
>>> xorq_pipeline = Pipeline.from_instance(sklearn_pipeline)

Fit and predict with xorq expressions:

>>> # Assuming train and test are xorq expressions
>>> fitted = xorq_pipeline.fit(train, features=("feature1", "feature2"), target="target")  # quartodoc: +SKIP
>>> predictions = fitted.predict(test)  # quartodoc: +SKIP

Update pipeline parameters:

>>> updated_pipeline = xorq_pipeline.set_params(knn__n_neighbors=10)  

Notes

  • The Pipeline class is frozen (immutable) using attrs.
  • Pipelines automatically detect transform vs predict steps based on method availability.
  • The fit() method returns a FittedPipeline that can transform and predict on new data.
  • Parameter updates use sklearn’s parameter naming convention (step__parameter).

Methods

Name Description
fit Fit the pipeline to training data.
from_instance Create a Pipeline from an existing scikit-learn Pipeline.

fit

fit(expr, features=None, target=None)

Fit the pipeline to training data.

This method sequentially fits each step in the pipeline, using the output of each transform step as input to the next step.

Parameters

Name Type Description Default
expr Expr The xorq expression containing training data. required
features tuple of str Column names to use as features. If None, infers from expr columns excluding the target. None
target str Target column name. Required if pipeline has a prediction step. None

Returns

Name Type Description
FittedPipeline A fitted pipeline that can transform and predict on new data.

Raises

Name Type Description
ValueError If target is not provided but pipeline has a prediction step.

Examples

>>> fitted = pipeline.fit(
...     train_data,
...     features=("sepal_length", "sepal_width"),
...     target="species"
... )  # quartodoc: +SKIP

from_instance

from_instance(instance)

Create a Pipeline from an existing scikit-learn Pipeline.

Parameters

Name Type Description Default
instance sklearn.pipeline.Pipeline A fitted or unfitted scikit-learn pipeline. required

Returns

Name Type Description
Pipeline A new xorq Pipeline wrapping the scikit-learn pipeline.

Examples

>>> import sklearn.pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.svm import SVC
>>> sklearn_pipe = sklearn.pipeline.Pipeline([
...     ("scaler", StandardScaler()),
...     ("svc", SVC())
... ])
>>> xorq_pipe = Pipeline.from_instance(sklearn_pipe)