import tempfile
from pathlib import Path
import xorq.api as xo
from xorq.catalog.catalog import Catalog
catalog_dir = Path(tempfile.mkdtemp()) / "catalog"
catalog = Catalog.from_repo_path(catalog_dir, init=True)
# Seed the catalog with a base entry, standing in for one a teammate published
orders = xo.memtable(
{
"order_id": [1, 2, 3, 4],
"region": ["EU", "US", "EU", "APAC"],
"amount": [100.0, 250.0, 175.0, 90.0],
},
name="orders",
)
catalog.add(orders, aliases=("orders",), sync=False)Compose catalog entries
This guide shows you how to build on work that’s already in a catalog: load an entry someone published, chain a new expression onto it, and register the result as a new entry. The catalog is a git repository, so each add becomes one reviewable commit.
The snippets below create a throwaway catalog in a temporary directory so the whole flow runs end to end. With a real catalog, skip the setup and point Catalog at your repository instead.
Prerequisites
- Xorq installed (Install Xorq)
- An initialized catalog: either
xorq catalog initon the command line orCatalog.from_repo_path(path, init=True)in Python - A
pyproject.tomlin your project:catalog.add()packages your project as a wheel so each entry records the dependencies it was built with
Steps
1. Open the catalog
For an existing catalog, the openers are:
catalog = Catalog.from_repo_path(Path("~/work/my-catalog").expanduser(), init=False)
# or, for a catalog created with `xorq catalog init`:
catalog = Catalog.from_name("my-catalog")This demo seeds the entry with a memtable so the page is self-contained: the data is serialized into the entry and travels with it. Real entries are usually backed by a named table or a deferred file or SQL read, where the entry references the source rather than embedding the rows. The composition steps below work the same either way.
2. Load the entry you want to build on
Fetch the entry by alias and materialize its expression:
entry = catalog.get_catalog_entry("orders", maybe_alias=True)
base = entry.expr
print(base.schema())ibis.Schema {
order_id int64
region string
amount float64
}
entry.expr rebuilds the full deferred expression, including any serialized data it carries. Nothing executes yet.
3. Chain a new expression onto it
The loaded entry is an ordinary Xorq expression. Compose on top of it like any other table:
summary = base.group_by("region").agg(
total=xo._.amount.sum(),
n_orders=xo._.count(),
)
print(summary.execute()) region total n_orders
0 US 250.0 1
1 APAC 90.0 1
2 EU 275.0 2
4. Register the result back
Add the composed expression as a new entry, with an alias so others can find it:
catalog.add(summary, aliases=("orders-by-region",), sync=False)sync=False commits locally without pushing, so you can review the diff first. Each add builds a wheel from your project’s pyproject.toml; expect Building wheel... output, it’s not an error.
5. Confirm the new entry exists
print(catalog.list_aliases())['orders', 'orders-by-region']
Both the original entry and your composition are now in the catalog. Push with catalog.push() (or plain git push from the catalog directory) when you’re ready to share.
See also
- Catalog reference
- Working with the catalog: the full publish/clone/branch/merge collaboration loop
xorq catalogcommand-line reference