Working with the catalog
In Build a semantic catalog you tagged a BSL flights model and dropped it into a local catalog. That catalog is a regular git repository—every entry, alias, and revision is a git commit. The moment you push it to a remote, anyone with access can clone it, query the model, propose changes, and run the model against their own backend.
This tutorial walks you through the collaboration loop end-to-end. You’ll play both sides:
- User A publishes the catalog to GitHub.
- User B clones it, recovers the model, and proposes a new entry via pull request.
- User A reviews, merges, and pulls the changes back.
You’ll also see how to swap the connection profile at recovery time, so a downstream user can run the cataloged expression against their own backend without modifying the entry.
Prerequisites
You need:
Completed Build a semantic catalog—same model definition, same project layout. This tutorial recreates the catalog at a stable path here, since the foundation tutorial used a temp directory.
User A’s
flights-tutorial/project directory from the foundation tutorial. This tutorial assumes it lives at~/flights-tutorial/so it sits as a sibling of User B’s~/flights-tutorial-userb/(created later in the tutorial). If you put it somewhere else, either move it now (mv path/to/flights-tutorial ~/flights-tutorial) or substitute your actual path wherever you see~/flights-tutorialbelow. (User B gets their own project; you’ll create it later in the tutorial.)The
sqliteextra installed in that project. The foundation tutorial installedxorq[bsl,duckdb]; addsqlitehere to demonstrate the profile-swap section against a different backend than the one User A built the entry with. From inside~/flights-tutorial/:uv add "xorq[bsl,duckdb,sqlite]"Git installed locally and authenticated with GitHub (the
ghcommand-line tool is convenient but not required).
Xorq’s catalog stores entries as files in a git repository: catalog.yaml is the index, aliases/ holds alias pointers, entries/ holds the entry zips themselves, and metadata/ holds a sidecar yaml per entry. Every catalog.add(...) is one git commit. That means GitHub’s permission model, branch protection rules, and pull requests Just Work—there’s no separate object store, no extra service to provision.
Publish the catalog to GitHub (User A)
You don’t need a GitHub account to work through this tutorial—a local bare git repo behaves the same way for clone/push/pull. Each remote-using step below has a Local bare repo tab next to the GitHub one; pick one and stay on it.
Recreate the catalog at a stable path (~/work/flights-catalog-usera) so the rest of the tutorial has somewhere persistent to point at, then re-add the flights model the same way the foundation tutorial did. Save the snippet below as publish_catalog.py in User A’s ~/flights-tutorial/ project directory and run it with uv run python publish_catalog.py from there:
# publish_catalog.py
from pathlib import Path
from boring_semantic_layer import to_semantic_table, to_tagged
import xorq.api as xo
from xorq.catalog.catalog import Catalog
catalog_dir = Path("~/work/flights-catalog-usera").expanduser()
catalog_dir.parent.mkdir(parents=True, exist_ok=True)
catalog = Catalog.from_repo_path(catalog_dir, init=True)
# Same memtable + semantic model as the foundation tutorial
flights = xo.memtable(
{
"origin": ["JFK", "LAX", "ORD", "JFK", "LAX", "ORD", "JFK", "LAX"],
"destination": ["LAX", "ORD", "JFK", "ORD", "JFK", "LAX", "LAX", "JFK"],
"carrier": ["AA", "UA", "AA", "UA", "AA", "UA", "AA", "UA"],
"dep_delay": [10.0, -5.0, 30.0, 15.0, -2.0, 45.0, 5.0, 20.0],
"distance": [2475, 1745, 740, 1300, 2475, 1745, 2475, 2475],
},
name="flights",
)
flights_model = (
to_semantic_table(flights)
.with_dimensions(
origin=lambda t: t.origin,
destination=lambda t: t.destination,
carrier=lambda t: t.carrier,
)
.with_measures(
flight_count=lambda t: t.count(),
avg_dep_delay=lambda t: t.dep_delay.mean(),
total_distance=lambda t: t.distance.sum(),
)
)
flights_model_expr = to_tagged(flights_model)
catalog.add(flights_model_expr, aliases=("flights-model",), sync=False)- 1
-
Catalog.from_repo_path(..., init=True)creates the leaf directory but not its parent, so make sure~/work/exists first.
catalog.add(...)
Each catalog.add(...) packages your project as a wheel and stores it inside the entry, so the cataloged expression keeps a frozen record of the dependencies it was built against. You’ll see Building wheel..., running egg_info, and Successfully built ...whl in the output, plus a UserWarning about local filesystem paths from the inline memtable—both are expected and not errors.
Wire up a remote. The first push needs -u to set upstream tracking:
Create an empty repository on GitHub (web UI: “New repository” → leave empty), then run these in the catalog directory:
cd ~/work/flights-catalog-usera
git remote add origin https://github.com/<you>/flights-catalog.git
git push -u origin mainOr, equivalently, with the gh command-line tool from inside the catalog directory—--source=. means “create the repo from this working directory,” so the cd matters:
cd ~/work/flights-catalog-usera
gh repo create <you>/flights-catalog --public --source=. --remote=origin --pushInitialize a local bare repo to act as the “remote”—same git semantics, no GitHub account needed:
git init --bare ~/work/flights-catalog-remote.gitThen, in the catalog directory (note: this is ~/work/flights-catalog-usera, not the bare repo you just created):
cd ~/work/flights-catalog-usera
git remote add origin "file://$HOME/work/flights-catalog-remote.git"
git push -u origin mainThat first push has to use raw git: catalog.push() doesn’t add remotes or set upstream tracking, so there’s nothing for it to push to until git remote add + git push -u have run once. Every subsequent publish can use catalog.push(), which runs git push against every remote configured on the repo. Either append it to the bottom of publish_catalog.py (where catalog is already in scope), or run it as its own one-off—save the snippet below as push_catalog.py in User A’s ~/flights-tutorial/ project and run uv run python push_catalog.py from there:
# push_catalog.py
from pathlib import Path
from xorq.catalog.catalog import Catalog
catalog = Catalog.from_repo_path(Path("~/work/flights-catalog-usera").expanduser(), init=False)
catalog.push()Verify the remote sees what you expect:
gh repo view --web # opens the repo on GitHubYou should see catalog.yaml (the index) at the top level, plus aliases/, entries/, and metadata/ directories. Click into aliases/ and you’ll see flights-model.zip—the alias from the foundation tutorial.
A bare repo has no working tree to ls, so list its files at main directly:
git -C ~/work/flights-catalog-remote.git ls-tree -r mainYou should see catalog.yaml, aliases/flights-model.zip, an entries/<hash>.zip, and a matching metadata/<hash>.zip.metadata.yaml—the same files that would appear under “Files” on GitHub.
Set up User B’s project
Switch hats. User B is on a different machine (or pretending to be—same laptop, different working directory and venv). Give them their own uv project so they aren’t sharing User A’s pyproject.toml or .venv:
mkdir ~/flights-tutorial-userb && cd ~/flights-tutorial-userb
uv init --bare
uv add "xorq[bsl,duckdb,sqlite]"
printf '\n[tool.setuptools]\npy-modules = []\n' >> pyproject.tomlThis is the same setup the foundation tutorial walked you through for User A, just under a different directory name. From here on, every User B script runs with uv run python <script>.py from inside ~/flights-tutorial-userb/. The uv run command picks up that project’s .venv automatically, so you never have to deactivate User A’s venv to run User B’s code.
With User A’s flights-tutorial/ and User B’s flights-tutorial-userb/, you have two .venvs on disk. uv run python script.py looks at the pyproject.toml of the directory you’re in and uses that venv—so as long as you cd into the right project before each command, the right venv is used. No source, no deactivate.
Clone the catalog (User B)
User B clones the catalog with one call:
Save the snippet below as clone_catalog.py in ~/flights-tutorial-userb/ and run uv run python clone_catalog.py from there:
# clone_catalog.py
from pathlib import Path
from xorq.catalog.catalog import Catalog
catalog = Catalog.clone_from(
"https://github.com/<you>/flights-catalog.git",
Path("~/work/flights-catalog-userb").expanduser(),
)
print("Aliases:", catalog.list_aliases())
# ['flights-model']The same thing is available from the command line—pick one or the other; running both fails on the second clone because the directory already exists:
uv run xorq catalog clone https://github.com/<you>/flights-catalog.git --path ~/work/flights-catalog-userb
uv run xorq catalog --path ~/work/flights-catalog-userb list-aliasesSave the snippet below as clone_catalog.py in ~/flights-tutorial-userb/ and run uv run python clone_catalog.py from there:
# clone_catalog.py
from pathlib import Path
from xorq.catalog.catalog import Catalog
catalog = Catalog.clone_from(
f"file://{Path('~/work/flights-catalog-remote.git').expanduser()}",
Path("~/work/flights-catalog-userb").expanduser(),
)
print("Aliases:", catalog.list_aliases())
# ['flights-model']The same thing is available from the command line—pick one or the other; running both fails on the second clone because the directory already exists:
uv run xorq catalog clone "file://$HOME/work/flights-catalog-remote.git" --path ~/work/flights-catalog-userb
uv run xorq catalog --path ~/work/flights-catalog-userb list-aliasesUser B never saw User A’s Python file, never saw the original to_semantic_table(...) call. All they have is a git clone—and that’s enough.
Recover and query the model (User B)
Recover the model the same way the foundation tutorial did. Because the catalog is plain git, the entry contents arrived during clone_from, so from_tagged can read them immediately. Save the snippet below as recover_model.py in ~/flights-tutorial-userb/ and run it with uv run python recover_model.py:
# recover_model.py
from pathlib import Path
from boring_semantic_layer import from_tagged
from xorq.catalog.catalog import Catalog
catalog = Catalog.from_repo_path(Path("~/work/flights-catalog-userb").expanduser(), init=False)
flights_entry = catalog.get_catalog_entry("flights-model", maybe_alias=True)
flights_model = from_tagged(flights_entry.expr)
print(
flights_model.query(
dimensions=("origin",),
measures=("flight_count", "avg_dep_delay"),
).order_by("origin").execute()
)The recovered SemanticModel has the same dimensions and measures User A defined. The data User A used (the inline xo.memtable(...) from the foundation tutorial) is serialized inside the entry, so the query runs locally—no shared filesystem, no out-of-band data transfer.
Propose a change via pull request (User B)
User B wants to publish a refined view: same model, but filtered to American Airlines only. Because the catalog is a git repo, they branch, commit, push, and open a PR—exactly like any other code change.
cd ~/work/flights-catalog-userb
git checkout -b add-aa-only-modelBuild the new entry in Python. User B has the same flights data as User A—it’s the inline memtable from the foundation tutorial—so they reconstruct it the same way. Save the snippet below as add_aa_model.py in ~/flights-tutorial-userb/:
# add_aa_model.py
from pathlib import Path
from boring_semantic_layer import to_semantic_table, to_tagged
import xorq.api as xo
from xorq.catalog.catalog import Catalog
catalog = Catalog.from_repo_path(Path("~/work/flights-catalog-userb").expanduser(), init=False)
flights = xo.memtable(
{
"origin": ["JFK", "LAX", "ORD", "JFK", "LAX", "ORD", "JFK", "LAX"],
"destination": ["LAX", "ORD", "JFK", "ORD", "JFK", "LAX", "LAX", "JFK"],
"carrier": ["AA", "UA", "AA", "UA", "AA", "UA", "AA", "UA"],
"dep_delay": [10.0, -5.0, 30.0, 15.0, -2.0, 45.0, 5.0, 20.0],
"distance": [2475, 1745, 740, 1300, 2475, 1745, 2475, 2475],
},
name="flights",
)
# Same model shape, restricted to AA
aa_flights = flights.filter(flights.carrier == "AA")
aa_model = (
to_semantic_table(aa_flights)
.with_dimensions(
origin=lambda t: t.origin,
destination=lambda t: t.destination,
)
.with_measures(
flight_count=lambda t: t.count(),
avg_dep_delay=lambda t: t.dep_delay.mean(),
)
)
aa_model_expr = to_tagged(aa_model)
catalog.add(aa_model_expr, aliases=("flights-aa-only",), sync=False)Run it from User B’s project—uv run puts you in that project’s venv, and catalog.add(...) finds ~/flights-tutorial-userb/pyproject.toml from cwd to build the dependency-pinning wheel:
cd ~/flights-tutorial-userb
uv run python add_aa_model.pyThat commits a new entry on the add-aa-only-model branch in your clone—sync=False deliberately keeps it local so you can review and push the branch yourself in the next step.
sync=False
Passing sync=False keeps the add local—it commits to the working branch but doesn’t push to the remote. You’ll push the branch yourself in the next step, after reviewing the diff.
Push the feature branch and open the PR. Run these from User B’s catalog directory (~/work/flights-catalog-userb):
git log --oneline -3 # confirm: "add: <hash> (aliases flights-aa-only)"
git push -u origin add-aa-only-model
gh pr create --title "Add AA-only flights model" --body "Adds an AA-filtered view of the flights model under alias flights-aa-only."User A reviews the PR on GitHub. Because each catalog.add(...) is a single commit, the diff is small and readable: a new alias under aliases/, a new entry under entries/, a new sidecar under metadata/, and an update to catalog.yaml.
git log --oneline -3 # confirm: "add: <hash> (aliases flights-aa-only)"
git push -u origin add-aa-only-modelThere’s no PR—User A reviews the change as a regular fetched branch (see the next section). The diff is the same: a new alias under aliases/, a new entry under entries/, a new sidecar under metadata/, and an update to catalog.yaml.
Merge the PR and pull the changes (User A)
User A reviews the diff, approves the PR, and clicks Merge pull request in the GitHub UI. (The gh command-line tool equivalent is gh pr merge --squash <pr-number> from a clone—but a tutorial reader doing this step manually is the most common path.)
There’s no PR UI to merge through, so User A pulls the branch into their catalog and merges by hand:
cd ~/work/flights-catalog-usera
git fetch origin add-aa-only-model
git diff main..origin/add-aa-only-model # review the change
git merge --no-ff origin/add-aa-only-model -m "Merge: add AA-only flights model"
git push origin main--no-ff keeps the merge commit so the history matches what GitHub would have produced from a “Merge pull request” click.
Once main has moved on the remote, User A pulls. catalog.pull() runs git pull against every git remote, fast-forwarding the local main. Save the snippet below as pull_catalog.py in User A’s ~/flights-tutorial/ project and run it with uv run python pull_catalog.py from there:
# pull_catalog.py
from pathlib import Path
from xorq.catalog.catalog import Catalog
catalog_a = Catalog.from_repo_path(Path("~/work/flights-catalog-usera").expanduser(), init=False)
catalog_a.pull()
print("Aliases now:", catalog_a.list_aliases())
# ['flights-model', 'flights-aa-only']pull() is effectively a no-op
You merged in your local clone and pushed up—your local main is already at the merged tip, so there’s nothing for pull() to fast-forward. The script is still worth running because it’s the same one-liner User A would use after a teammate clicked “Merge pull request” on GitHub; here it just confirms the new alias is visible.
The same alias is now visible everywhere the catalog is cloned. Anyone with access can recover and query flights-aa-only exactly the way they recover flights-model.
Swap the profile at recovery time
The catalog stores expressions, not connections. When User A built the entry they used the default Xorq backend (a xorq_datafusion session); when User B recovers it, they may want to execute against a different profile—perhaps a SQLite database they’ve configured locally, a Postgres instance with extra resources, or a Snowflake warehouse.
A profile in Xorq is a named connection configuration: con_name plus connection kwargs, serialized to disk. Save one with Profile.from_con(con).save(alias=...), load it with Profile.load(...).
This section demonstrates the swap by moving to SQLite—a genuinely different backend than the default, no server to provision, and the adbc-driver-sqlite connector already came in via the sqlite extra in the prereqs. Save the snippet below as profile_swap.py in ~/flights-tutorial-userb/ and run it with uv run python profile_swap.py:
# profile_swap.py
from pathlib import Path
from xorq.vendor.ibis.backends.profiles import Profile
import xorq.api as xo
from xorq.catalog.catalog import Catalog
catalog = Catalog.from_repo_path(Path("~/work/flights-catalog-userb").expanduser(), init=False)
# Capture User B's preferred connection as a named profile
sqlite_con = xo.sqlite.connect() # in-memory SQLite
Profile.from_con(sqlite_con).save(alias="local_dev_sqlite", clobber=True)
# Later—possibly in a different script—load the profile and bind the entry to it
profile = Profile.load("local_dev_sqlite")
con = profile.get_con()
expr = catalog.load("flights-model", con=con)
print("Executing against backend:", con.name)
print(
expr.group_by("origin")
.agg(
flight_count=expr.count(),
avg_dep_delay=expr.dep_delay.mean(),
)
.order_by("origin")
.execute()
)Executing against backend: sqlite
origin flight_count avg_dep_delay
0 JFK 3 10.000000
1 LAX 3 4.333333
2 ORD 2 37.500000
The Executing against backend: sqlite line is the proof—User A cataloged the entry against xorq_datafusion, User B loaded it bound to a SQLite connection, and .execute() shipped the work to SQLite. catalog.load(name, con=...) returns the underlying Xorq expression—the flights table, in this case—bound to whichever connection you pass; you compose any group-by / aggregation you like on top, and .execute() runs it on the chosen backend. The entry on disk is unchanged; the swap is purely a runtime decision.
catalog.load vs from_tagged
from_tagged(entry.expr) rebuilds the BSL SemanticModel so you can call .query(...) against it—that’s the right tool when you want the semantic-layer interface back. catalog.load(name, con=...) skips the BSL layer and gives you the underlying Xorq expression bound to a connection of your choosing—that’s the right tool when you want to redirect execution to a specific backend without touching the entry. They compose: profiles for the connection, BSL for the dimensions and measures.
What you learned
- A Xorq catalog is a git repository: every
catalog.add(...)is one commit, and the diff is small enough to review on GitHub. - Sharing the catalog is
catalog.push()(after a one-timegit remote add origin+git push -u origin main); cloning usesCatalog.clone_from(...). - Collaboration uses the GitHub workflow you already know: branch, commit, push, open a PR. The reviewer sees alias and entry files in the diff; merging makes the new alias available everywhere the catalog is cloned.
from_tagged(flights_entry.expr)recovers the BSL model on the consumer side—the same call as in the foundation tutorial, regardless of where the entry came from.catalog.load(name, con=...)rebinds a cataloged expression to a different connection at recovery time. Combined with namedProfiles, downstream users can pick their own execution backend—SQLite, Postgres, anything Xorq supports—without modifying the entry.
Next steps
- Your first build—package the cataloged model into a portable build artifact.
- Switch backends—see what kinds of profiles Xorq supports.
- Explore caching—keep the recovered query path fast for downstream users.