>>> from xorq.ml import Pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.neighbors import KNeighborsClassifier
>>> import sklearn.pipelinePipeline
Pipeline()A machine learning pipeline that chains multiple processing steps together.
This class provides a xorq-native implementation that wraps scikit-learn pipelines, enabling deferred execution and integration with xorq expressions. The pipeline can contain both transform steps (data preprocessing) and a final prediction step.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| steps | tuple of Step | Sequence of Step objects that make up the pipeline. | required |
Attributes
| Name | Type | Description |
|---|---|---|
| steps | tuple of Step | The sequence of processing steps. |
| instance | sklearn.pipeline.Pipeline | The equivalent scikit-learn Pipeline instance. |
| transform_steps | tuple of Step | All steps except the final prediction step (if any). |
| predict_step | Step or None | The final step if it has a predict method, otherwise None. |
Examples
Create a pipeline from scikit-learn estimators:
>>> sklearn_pipeline = sklearn.pipeline.Pipeline([
... ("scaler", StandardScaler()),
... ("knn", KNeighborsClassifier(n_neighbors=5))
... ])
>>> xorq_pipeline = Pipeline.from_instance(sklearn_pipeline)Fit and predict with xorq expressions:
>>> # Assuming train and test are xorq expressions
>>> fitted = xorq_pipeline.fit(train, features=("feature1", "feature2"), target="target") # quartodoc: +SKIP
>>> predictions = fitted.predict(test) # quartodoc: +SKIPUpdate pipeline parameters:
>>> updated_pipeline = xorq_pipeline.set_params(knn__n_neighbors=10) Notes
- The Pipeline class is frozen (immutable) using attrs.
- Pipelines automatically detect transform vs predict steps based on method availability.
- The fit() method returns a FittedPipeline that can transform and predict on new data.
- Parameter updates use sklearn’s parameter naming convention (step__parameter).
Methods
| Name | Description |
|---|---|
| fit | Fit the pipeline to training data. |
| from_instance | Create a Pipeline from an existing scikit-learn Pipeline. |
fit
fit(expr, features=None, target=None, storage=None)Fit the pipeline to training data.
This method sequentially fits each step in the pipeline, using the output of each transform step as input to the next step.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | Expr | The xorq expression containing training data. | required |
| features | tuple of str | Column names to use as features. If None, infers from expr columns excluding the target. | None |
| target | str | Target column name. Required if pipeline has a prediction step. | None |
| storage | Storage | Storage backend for caching fitted models. | None |
Returns
| Name | Type | Description |
|---|---|---|
| FittedPipeline | A fitted pipeline that can transform and predict on new data. |
Raises
| Name | Type | Description |
|---|---|---|
| ValueError | If target is not provided but pipeline has a prediction step. |
Examples
>>> fitted = pipeline.fit(
... train_data,
... features=("sepal_length", "sepal_width"),
... target="species"
... ) # quartodoc: +SKIPfrom_instance
from_instance(instance, deep=False)Create a Pipeline from an existing scikit-learn Pipeline.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| instance | sklearn.pipeline.Pipeline | A fitted or unfitted scikit-learn pipeline. | required |
Returns
| Name | Type | Description |
|---|---|---|
| Pipeline | A new xorq Pipeline wrapping the scikit-learn pipeline. |
Examples
>>> import sklearn.pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.svm import SVC>>> sklearn_pipe = sklearn.pipeline.Pipeline([
... ("scaler", StandardScaler()),
... ("svc", SVC())
... ])
>>> xorq_pipe = Pipeline.from_instance(sklearn_pipe)