Quickstart

Build and run your first Xorq ML pipeline in under five minutes

After completing this guide, you have a working ML pipeline. This pipeline loads data, trains a model, and generates predictions. It uses Xorq’s deferred execution model.

What you’ll build

In this quickstart, you:

  1. Set up your environment and install Xorq
  2. Initialize a project
  3. Build a pipeline expression
  4. Run your pipeline and save results
  5. Serve a UDXF expression as an endpoint

The entire process takes about five minutes. By the end, you understand how Xorq transforms Python code into executable, servable pipelines.

Step 1: Set up your environment and install Xorq

This step covers environment setup, Xorq installation, and verification.

Check your Python version

TipPython version

Xorq requires Python 3.10 or higher, but also no higher than 3.13. Check your version with python --version. If you need to install or update Python, then visit the official Python downloads page.

Create a virtual environment

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate
python -m venv .venv
.venv\Scripts\activate

Update pip

Before installing Xorq, update pip to the latest version. This avoids compatibility issues:

python -m pip install --upgrade pip

Install Xorq

Now install Xorq using your preferred package manager:

pip install "xorq[examples]"
nix run github:xorq-labs/xorq

Verify installation

Verify your installation by checking the version. Open a Python shell:

python
Note

If python does not work, then try python3 instead.

Import Xorq and check the version:

import xorq
xorq.__version__

You see a version number like '0.3.4'.

Exit the Python shell:

exit()

If you see a version number, then Xorq is installed correctly.

With Xorq installed, the next step is to create your first project.

Step 2: Initialize a project

Create a new Xorq project using the built-in penguins template. This template demonstrates a complete ML workflow with the Palmer Penguins dataset:

xorq init -t penguins -p penguins_example
cd penguins_example
1
Creates a new project called penguins_example using the penguins template.
2
Moves into the project directory.

The template generates an expr.py file. This file contains a complete ML pipeline with data loading, train/test splitting, model training, and prediction.

With your project initialized, the next step is to build it into an executable format.

Step 3: Build your expression

Convert your pipeline into a serialized, executable format using the build command:

xorq build expr.py

This serializes your pipeline and generates a content-addressed build directory.

Output:

Building expr from expr.py
Written 'expr' to builds/12287e173c17
builds/12287e173c17

The build creates a directory (like builds/12287e173c17) containing your serialized pipeline. This hash uniquely identifies your build.

Save this hash as an environment variable:

export BUILD_HASH=12287e173c17

Replace 12287e173c17 with your actual hash.

$env:BUILD_HASH="12287e173c17"

Replace 12287e173c17 with your actual hash.

set BUILD_HASH=12287e173c17

Replace 12287e173c17 with your actual hash.

ImportantReplace with your hash

The hash 12287e173c17 is an example. Copy the actual hash from your build output.

Step 4: Run your pipeline

Execute your built pipeline and view the results. Replace <BUILD_HASH> with the hash from your build output in step 3:

WarningWindows users

Skip the first tab and go straight to “Save to file.” Running without an output file might cause issues on Windows.

xorq run builds/$BUILD_HASH

Runs the pipeline and displays results in your terminal.

xorq run builds/$BUILD_HASH -o predictions.parquet

Runs the pipeline and saves results to predictions.parquet.

xorq run builds/$env:BUILD_HASH -o predictions.parquet

Runs the pipeline and saves results to predictions.parquet.

xorq run builds/%BUILD_HASH% -o predictions.parquet

Runs the pipeline and saves results to predictions.parquet.

The command saves prediction results to predictions.parquet. You can open this file with pandas, DuckDB, or Polars.

Step 5: Deploy your pipeline as an API endpoint

Deploy the pipeline you built in step 3 as a live API endpoint.

ImportantUsing your build hash

This step uses the same BUILD_HASH from step 3. You’re deploying the exact pipeline you just built and ran.

Start the server

Run the server with your build hash:

xorq serve-unbound builds/$BUILD_HASH \
  --host localhost \
  --port 8001 \
  --to_unbind_hash da8ba93cc97709f4ad9502e16d71e8e3
xorq serve-unbound builds/$env:BUILD_HASH `
  --host localhost `
  --port 8001 `
  --to_unbind_hash da8ba93cc97709f4ad9502e16d71e8e3
xorq serve-unbound builds/%BUILD_HASH% --host localhost --port 8001 --to_unbind_hash da8ba93cc97709f4ad9502e16d71e8e3

The command saves prediction results to predictions.parquet. You can open this file with any Parquet-compatible tool. Compatible tools include pandas, DuckDB, or Polars.

With your pipeline running locally, the next step is to serve a UDXF expression as an endpoint.

Step 5: Serve a UDXF expression

NoteNew terminal setup

Open a new terminal window. Activate the same virtual environment (.venv) and navigate to the same directory (penguins_example) where your server is running.

Create a file called query_endpoint.py:

# udxf_example.py
import pandas as pd
import xorq.api as xo


def add_computed_column(df: pd.DataFrame) -> pd.DataFrame:
    """Add a computed column that doubles the input value."""
    result = df.copy()
    result["doubled"] = result["value"] * 2
    return result


input_schema = xo.schema({"value": "int64"})
output_schema = xo.schema({"value": "int64", "doubled": "int64"})


con = xo.connect()
input_table = xo.memtable({"value": [1, 2, 3, 4, 5]}, schema=input_schema)


expr = xo.expr.relations.flight_udxf(
    input_table,
    process_df=add_computed_column,
    maybe_schema_in=input_schema,
    maybe_schema_out=output_schema,
    con=con,
    make_udxf_kwargs={
        "name": "double_value",
        "command": "double_value"
    }
)
1
Defines a transformation function that doubles input values.
2
Specifies input and output schemas for type safety.
3
Creates a simple input table with test data.
4
Creates the UDXF expression with the transformation function.

Build the UDXF expression

Build the expression:

xorq build udxf_example.py --expr-name expr

This creates a second <HASH> in your builds/<HASH>/ directory with your serialized UDXF expression.

Start the Flight server

Start the Flight server with your built UDXF expression. Replace <BUILD_HASH> with the second hash from your build output:

xorq serve-flight-udxf --port 8001 builds/<BUILD_HASH>

Your UDXF is now running as an endpoint on localhost:8001. Keep this terminal window open while you query it.

Query your served UDXF

With the server running, create a new Python file called query_udxf.py:

# query_udxf.py
import xorq.api as xo
import random as rnd


con = xo.flight.connect(port=8001)


exchange = con.get_exchange("default")


expr = xo.memtable({ "value" : rnd.choices(tuple(range(100)), k=10) }, schema= xo.schema({"value": "int64"})).pipe(exchange)


print("Executing via Flight do_exchange...")
result_df = expr.execute()

print(result_df)
1
Connect to the Flight server running on port 8001. This establishes a connection to communicate with your deployed server.
2
Get the exchange named “default” from the connection. This exchange represents the UDXF transformation you deployed.
3
Create an in-memory table with random integer values and pipe it through the exchange. This builds an expression that will apply your UDXF transformation to the input data.
4
Execute the expression via Flight do_exchange and print the results. The server applies the transformation (doubling the values) and returns the result as a pandas DataFrame.

Run the query script

Run your query script:

python query_udxf.py

Output:

Executing via Flight do_exchange...
  predicted
0    Adelie
1    Adelie
2    Adelie
3    Adelie
4    Adelie

Your pipeline now runs as a live API. You sent penguin measurements to the server, the server executed your trained model, and you received species predictions back—all via Flight protocol.

What you built

You just built your first Xorq ML pipeline. You initialized a project, built it into a portable format, ran it to generate predictions, and served it as an API endpoint.

The entire workflow uses Xorq’s deferred execution model. No computation runs until you explicitly execute or query the pipeline.

Next steps

Continue learning: