Step

Step()

A single step in a machine learning pipeline that wraps a scikit-learn estimator.

This class represents an individual processing step that can either transform data (transformers like StandardScaler, SelectKBest) or make predictions (classifiers like KNeighborsClassifier, LinearSVC). Steps can be combined into Pipeline objects to create complex ML workflows.

Parameters

Name Type Description Default
typ type The scikit-learn estimator class (must inherit from BaseEstimator). required
name str A unique name for this step. If None, generates a name from the class name and ID. required
params_tuple tuple Tuple of (parameter_name, parameter_value) pairs for the estimator. Parameters are automatically sorted for consistency. required

Attributes

Name Type Description
typ type The scikit-learn estimator class.
name str The unique name for this step in the pipeline.
params_tuple tuple Sorted tuple of parameter key-value pairs.

Examples

Create a scaler step:

>>> from xorq.ml import Step
>>> from sklearn.preprocessing import StandardScaler
>>> scaler_step = Step(typ=StandardScaler, name="scaler")
>>> scaler_step.instance
StandardScaler()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Create a classifier step with parameters:

>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn_step = Step(
...     typ=KNeighborsClassifier,
...     name="knn",
...     params_tuple=(("n_neighbors", 5), ("weights", "uniform"))
... )
>>> knn_step.instance
KNeighborsClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Notes

  • The Step class is frozen (immutable) using attrs.
  • All estimators must inherit from sklearn.base.BaseEstimator.
  • Parameter tuples are automatically sorted for hash consistency.
  • Steps can be fitted to data using the fit() method which returns a FittedStep.

Methods

Name Description
fit Fit this step to the given expression data.
from_fit_predict Create a Step from custom fit and predict functions.
from_instance_name Create a Step from an existing scikit-learn estimator instance.
from_name_instance Create a Step from a name and estimator instance.
set_params Create a new Step with updated parameters.

fit

fit(expr, features=None, target=None, storage=None, dest_col=None)

Fit this step to the given expression data.

Parameters

Name Type Description Default
expr Expr The xorq expression containing the training data. required
features tuple of str Column names to use as features. If None, infers from expr.columns. None
target str Target column name. Required for prediction steps. None
storage Storage Storage backend for caching fitted models. None
dest_col str Destination column name for transformed output. None

Returns

Name Type Description
FittedStep A fitted step that can transform or predict on new data.

from_fit_predict

from_fit_predict(fit, predict, return_type, klass_name=None, name=None)

Create a Step from custom fit and predict functions.

Parameters

Name Type Description Default
fit callable Function to fit the model. required
predict callable Function to make predictions. required
return_type DataType The return type for predictions. required
klass_name str Name for the generated estimator class. None
name str Name for the step. None

Returns

Name Type Description
Step A new Step with a dynamically created estimator type.

from_instance_name

from_instance_name(instance, name=None)

Create a Step from an existing scikit-learn estimator instance.

Parameters

Name Type Description Default
instance object A scikit-learn estimator instance. required
name str Name for the step. If None, generates from instance class name. None

Returns

Name Type Description
Step A new Step wrapping the estimator instance.

from_name_instance

from_name_instance(name, instance)

Create a Step from a name and estimator instance.

Parameters

Name Type Description Default
name str Name for the step. required
instance object A scikit-learn estimator instance. required

Returns

Name Type Description
Step A new Step wrapping the estimator instance.

set_params

set_params(**kwargs)

Create a new Step with updated parameters.

Parameters

Name Type Description Default
**kwargs Parameter names and values to update. {}

Returns

Name Type Description
Step A new Step instance with updated parameters.

Examples

>>> knn_step = Step(typ=KNeighborsClassifier, name="knn")
>>> updated_step = knn_step.set_params(n_neighbors=10, weights="distance")