This guide demonstrates ML integration patterns using Xorq’s Step and Pipeline classes with Scikit-Learn estimators, for it, we’ll use the iris dataset from xorq.examples.iris.
Prerequisites
pip install "xorq[examples]"
Core Components
Step Class
The Step class wraps individual scikit-learn estimators for use in Xorq pipelines:
from sklearn.feature_selection import SelectKBestfrom sklearn.linear_model import LogisticRegressionfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.preprocessing import StandardScalerimport xorq as xofrom xorq.expr.ml.pipeline_lib import Step, Pipeline# Load the iris datasetcon = xo.connect()iris_data = xo.examples.iris.fetch(backend=con)print(iris_data.schema())
Using from_instance Class Method with Scikit-Learn Pipelines
import sklearn.pipeline# Create a scikit-learn pipeline firstsklearn_simple_pipeline = sklearn.pipeline.Pipeline([ ("scaler", StandardScaler()), ("knn", KNeighborsClassifier(n_neighbors=5))])# Convert scikit-learn pipelines to Xorq Pipelines using from_instancexorq_simple_pipeline = Pipeline.from_instance(sklearn_simple_pipeline)print("Pipelines created from scikit-learn instances:")print(f"Simple pipeline steps: {[step.name for step in xorq_simple_pipeline.steps]}")# You can also create pipelines from pre-fitted scikit-learn pipelinesfitted_sklearn_pipeline = sklearn.pipeline.Pipeline([ ("scaler", fitted_scaler), ("knn", fitted_knn)])xorq_fitted_pipeline = Pipeline.from_instance(fitted_sklearn_pipeline)print(f"Fitted pipeline steps: {[step.name for step in xorq_fitted_pipeline.steps]}")
Pipelines created from scikit-learn instances:
Simple pipeline steps: ['scaler', 'knn']
Fitted pipeline steps: ['scaler', 'knn']
Mixed Approach - Combining Both Methods
# Create some steps directly and others from instancesdirect_scaler = Step(typ=StandardScaler, name="direct_scaler")instance_selector = Step.from_instance_name(SelectKBest(k=4), name="instance_selector")# Create a sklearn estimator with custom configurationcustom_knn = KNeighborsClassifier( n_neighbors=7, weights='distance', metric='manhattan')instance_knn = Step.from_instance_name(custom_knn, name="custom_knn")# Mix direct and instance-based steps in a pipelinemixed_pipeline = Pipeline(steps=(direct_scaler, instance_selector, instance_knn))print(f"\nMixed pipeline steps: {[step.name for step in mixed_pipeline.steps]}")print("Step details:")for step in mixed_pipeline.steps:print(f" {step.name}: {dict(step.params_tuple)}")
# Define features and targetfeatures = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']target ='species'# Fit the pipelines created from instancesfitted_xorq_simple = xorq_simple_pipeline.fit( iris_data, features=features, target=target)fitted_mixed = mixed_pipeline.fit( iris_data, features=features, target=target)# Make predictionssimple_instance_predictions = fitted_xorq_simple.predict(iris_data)mixed_predictions = fitted_mixed.predict(iris_data)print("\nPipeline Predictions from from_instance:")print("Simple pipeline (from sklearn):")print(simple_instance_predictions.head().execute())print("\nMixed pipeline:")print(mixed_predictions.head().execute())