>>> from xorq.ml import Pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.neighbors import KNeighborsClassifier
>>> import sklearn.pipeline
Pipeline
Pipeline()
A machine learning pipeline that chains multiple processing steps together.
This class provides a xorq-native implementation that wraps scikit-learn pipelines, enabling deferred execution and integration with xorq expressions. The pipeline can contain both transform steps (data preprocessing) and a final prediction step.
Parameters
Name | Type | Description | Default |
---|---|---|---|
steps | tuple of Step | Sequence of Step objects that make up the pipeline. | required |
Attributes
Name | Type | Description |
---|---|---|
steps | tuple of Step | The sequence of processing steps. |
instance | sklearn.pipeline.Pipeline | The equivalent scikit-learn Pipeline instance. |
transform_steps | tuple of Step | All steps except the final prediction step (if any). |
predict_step | Step or None | The final step if it has a predict method, otherwise None. |
Examples
Create a pipeline from scikit-learn estimators:
>>> sklearn_pipeline = sklearn.pipeline.Pipeline([
"scaler", StandardScaler()),
... ("knn", KNeighborsClassifier(n_neighbors=5))
... (
... ])>>> xorq_pipeline = Pipeline.from_instance(sklearn_pipeline)
Fit and predict with xorq expressions:
>>> # Assuming train and test are xorq expressions
>>> fitted = xorq_pipeline.fit(train, features=("feature1", "feature2"), target="target") # quartodoc: +SKIP
>>> predictions = fitted.predict(test) # quartodoc: +SKIP
Update pipeline parameters:
>>> updated_pipeline = xorq_pipeline.set_params(knn__n_neighbors=10)
Notes
- The Pipeline class is frozen (immutable) using attrs.
- Pipelines automatically detect transform vs predict steps based on method availability.
- The fit() method returns a FittedPipeline that can transform and predict on new data.
- Parameter updates use sklearn’s parameter naming convention (step__parameter).
Methods
Name | Description |
---|---|
fit | Fit the pipeline to training data. |
from_instance | Create a Pipeline from an existing scikit-learn Pipeline. |
fit
=None, target=None) fit(expr, features
Fit the pipeline to training data.
This method sequentially fits each step in the pipeline, using the output of each transform step as input to the next step.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | Expr | The xorq expression containing training data. | required |
features | tuple of str | Column names to use as features. If None, infers from expr columns excluding the target. | None |
target | str | Target column name. Required if pipeline has a prediction step. | None |
Returns
Name | Type | Description |
---|---|---|
FittedPipeline | A fitted pipeline that can transform and predict on new data. |
Raises
Name | Type | Description |
---|---|---|
ValueError | If target is not provided but pipeline has a prediction step. |
Examples
>>> fitted = pipeline.fit(
... train_data,=("sepal_length", "sepal_width"),
... features="species"
... target# quartodoc: +SKIP ... )
from_instance
from_instance(instance)
Create a Pipeline from an existing scikit-learn Pipeline.
Parameters
Name | Type | Description | Default |
---|---|---|---|
instance | sklearn.pipeline.Pipeline | A fitted or unfitted scikit-learn pipeline. | required |
Returns
Name | Type | Description |
---|---|---|
Pipeline | A new xorq Pipeline wrapping the scikit-learn pipeline. |
Examples
>>> import sklearn.pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.svm import SVC
>>> sklearn_pipe = sklearn.pipeline.Pipeline([
"scaler", StandardScaler()),
... ("svc", SVC())
... (
... ])>>> xorq_pipe = Pipeline.from_instance(sklearn_pipe)