Get started with Xorq

Scaffold a baseball analytics pipeline in minutes using init, deferred_read_csv, and build commands

Scaffold a Xorq project, load Moneyball CSVs, build a top-batters leaderboard.

Prerequisites

curl -LsSf https://astral.sh/uv/install.sh | sh

Scaffold

Xorq supports Python 3.10 through 3.13. This tutorial pins 3.13 so uv scaffolds and syncs against one explicit interpreter:

uv python install 3.13
uvx -p3.13 xorq@latest init --path moneyball
cd moneyball
uv sync -p3.13
source .venv/bin/activate

uvx bootstraps Xorq once. From here on, the xorq on your PATH comes from the project’s pinned .venv.

Use uvx, not a permanent install

Prefer uvx xorq@latest ... for global Xorq commands over other install methods — it guarantees the latest version. After creating the Xorq project, activate the uv environment and use the xorq in the venv from then on, so the xorq you invoke matches the xorq pinned in the pyproject.

List everything the scaffold created, including dotfiles:

ls -1A

A complete locked project: pyproject.toml, uv.lock, an exported requirements.txt, a src/ package, and a starter expr.py. The with-uvenv script and .envrc are helpers Xorq uses internally — you don’t need to invoke them.

Get the data

curl -fL -o Batting.csv https://raw.githubusercontent.com/xorq-labs/baseballdatabank/master/core/Batting.csv
curl -fL -o People.csv https://raw.githubusercontent.com/xorq-labs/baseballdatabank/master/core/People.csv

Write the expression

Replace expr.py with:

# expr.py
import xorq.api as xo

# in-process backend; no server, no config
con = xo.connect()

# register CSVs as typed expressions, no read until build time
batting = xo.deferred_read_csv(con=con, path="Batting.csv", table_name="Batting")
people  = xo.deferred_read_csv(con=con, path="People.csv",  table_name="People")

# attach player bio columns to each player-season row;
# inner (the default) drops batting rows with no matching player in People
batting = batting.join(
    people["playerID", "nameFirst", "nameLast", "bats", "throws", "birthYear"],
    "playerID",
    how="inner",
)

# HBP and SF are null in older seasons, coalesce so arithmetic doesn't null-poison
batting = batting.mutate(
    HBP=batting.HBP.fill_null(0),
    SF=batting.SF.fill_null(0),
)

# modern era, AL/NL only, real hitters (>=100 AB so batting avg is meaningful)
batting = batting.filter(
    batting.lgID.isin(["AL", "NL"]),
    batting.yearID > 1965,
    batting.AB >= 100,
)

# on-base percentage: (H + BB + HBP) / (AB + BB + HBP + SF) --- computed after
# the AB >= 100 filter so the denominator is never zero
batting = batting.mutate(
    OBP=(batting.H + batting.BB + batting.HBP)
        / (batting.AB.cast("float64") + batting.BB + batting.HBP + batting.SF)
)

# batting average per player-season
ranked = batting.mutate(batting_avg=batting.H / batting.AB.cast("float64"))

# rank within each (league, year) by batting_avg, descending
win    = xo.window(group_by=["lgID", "yearID"], order_by=xo.desc("batting_avg"))
ranked = ranked.mutate(rank=xo.row_number().over(win) + 1)

# top 10 per league-year; `expr` is what `xorq build` compiles
expr = ranked.filter(ranked.rank <= 10).drop("rank")

xo.deferred_read_csv registers each CSV against any backend, sampling the file to infer its schema at build time — the bulk read stays deferred until execution. Each binding is a typed Xorq expression you can join, filter, and aggregate without writing pandas. expr describes the computation. xorq uv build compiles it.

Build and run

BUILD=$(xorq uv build expr.py -e expr --builds-dir builds | tail -1)
echo "$BUILD"

You’ll see something like builds/f22643d56d2c — Xorq derives the hash from the expression graph, so yours may differ.

xorq uv run "$BUILD" --output-path top_batters.parquet

Each build is self-contained:

ls "$BUILD"

expr.yaml is the serialized graph. requirements.txt pins the dependencies from your lock. Hand the directory to anyone with xorq and xorq uv run reproduces the result.

If expr.py changes its build changes. Edit expr.py and run xorq uv build again — bump the year filter from 1965 to 1970:

xorq uv build expr.py -e expr --builds-dir builds

You’ll get a new directory next to the old one:

ls -1 builds

Nothing is overwritten. Re-running xorq uv build with no changes returns the same hash, no new directory.

Inspect

xorq uv run "$BUILD" --format csv -o /dev/stdout 2>/dev/null | head -6

What you used

xo.deferred_read_csv — lazy CSV registration, schema inferred at build time
.join, .mutate, .filter — expression construction, nothing executes
xo.window / xo.row_number / xo.desc — windowing, all on the xo namespace
xorq uv build / xorq uv run — compile to builds/<hash>/, then execute in the project’s locked env