A single step in a machine learning pipeline that wraps a scikit-learn estimator.
This class represents an individual processing step that can either transform data (transformers like StandardScaler, SelectKBest) or make predictions (classifiers like KNeighborsClassifier, LinearSVC). Steps can be combined into Pipeline objects to create complex ML workflows.
Parameters
| typ |
type |
The scikit-learn estimator class (must inherit from BaseEstimator). |
required |
| name |
str |
A unique name for this step. If None, generates a name from the class name and ID. |
required |
| params_tuple |
tuple |
Tuple of (parameter_name, parameter_value) pairs for the estimator. Parameters are automatically sorted for consistency. |
required |
Attributes
| typ |
type |
The scikit-learn estimator class. |
| name |
str |
The unique name for this step in the pipeline. |
| params_tuple |
tuple |
Sorted tuple of parameter key-value pairs. |
Examples
Create a scaler step:
>>> from xorq.ml import Step
>>> from sklearn.preprocessing import StandardScaler
>>> scaler_step = Step(typ=StandardScaler, name="scaler")
>>> scaler_step.instance
Create a classifier step with parameters:
>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn_step = Step(
... typ=KNeighborsClassifier,
... name="knn",
... params_tuple=(("n_neighbors", 5), ("weights", "uniform"))
... )
>>> knn_step.instance
Notes
- The Step class is frozen (immutable) using attrs.
- All estimators must inherit from sklearn.base.BaseEstimator.
- Parameter tuples are automatically sorted for hash consistency.
- Steps can be fitted to data using the fit() method which returns a FittedStep.
Methods
fit
fit(expr, features=None, target=None, cache=None, dest_col=None)
Fit this step to the given expression data.
Parameters
| expr |
Expr |
The xorq expression containing the training data. |
required |
| features |
tuple of str |
Column names to use as features. If None, infers from expr.columns. |
None |
| target |
str |
Target column name. Required for prediction steps. |
None |
| cache |
Cache |
Storage backend for caching fitted models. |
None |
| dest_col |
str |
Destination column name for transformed output. |
None |
Returns
|
FittedStep |
A fitted step that can transform or predict on new data. |
from_fit_predict
from_fit_predict(fit, predict, return_type, klass_name=None, name=None)
Create a Step from custom fit and predict functions.
Parameters
| fit |
callable |
Function to fit the model. |
required |
| predict |
callable |
Function to make predictions. |
required |
| return_type |
DataType |
The return type for predictions. |
required |
| klass_name |
str |
Name for the generated estimator class. |
None |
| name |
str |
Name for the step. |
None |
Returns
|
Step |
A new Step with a dynamically created estimator type. |
from_instance_name
from_instance_name(instance, name=None, deep=False)
Create a Step from an existing scikit-learn estimator instance.
Parameters
| instance |
object |
A scikit-learn estimator instance. |
required |
| name |
str |
Name for the step. If None, generates from instance class name. |
None |
Returns
|
Step |
A new Step wrapping the estimator instance. |
from_name_instance
from_name_instance(name, instance, deep=False)
Create a Step from a name and estimator instance.
Parameters
| name |
str |
Name for the step. |
required |
| instance |
object |
A scikit-learn estimator instance. |
required |
Returns
|
Step |
A new Step wrapping the estimator instance. |
set_params
Create a new Step with updated parameters.
Parameters
| **kwargs |
|
Parameter names and values to update. |
{} |
Returns
|
Step |
A new Step instance with updated parameters. |
Examples
>>> knn_step = Step(typ=KNeighborsClassifier, name="knn")
>>> updated_step = knn_step.set_params(n_neighbors=10, weights="distance")