The catalog as executable memory

Most “memory” an agent accumulates is prose: Markdown notes, a MEMORY.md index, chat history. Prose memory has to be reread and reinterpreted by whatever model picks it up next, and it has no way to run. A Xorq catalog is the other kind of memory—executable. Each entry is a content-addressed pipeline plus the environment to run it, so a future agent or human doesn’t re-derive the computation from a description; it executes the artifact and gets the same answer.

The same instinct drives teams to keep agent memory as Markdown in a git repo rather than in a vector database: the files stay the source of truth, and the data never gets trapped behind a proprietary schema or query language. A Xorq catalog takes that instinct one step further. Where a Markdown note still has to be read and acted on, a catalog entry can be executed directly.

This page explains what that means and why the catalog is built that way. For the hands-on version, see Working with the catalog.

What a memory entry is

In a prose memory system the unit is a snippet of text. In Xorq the unit is an entry: a built expression together with its pinned Python environment. An entry is enough to reproduce a result on a machine that has never seen the code that produced it—the wheel and requirements.txt travel with it.

You reason about an entry the way you reason about a matrix—by its invariants, not its cells:

Schema—what columns it produces, and of what type.
Lineage—which sources and transforms it depends on.
Content hash—a deterministic fingerprint of the computation. Same pipeline, same hash; change a filter and the hash changes.
Deterministic execution—the hash fingerprints the pipeline specification, and the pinned environment makes “run it again” reproduce the same computation on the same code path. (It doesn’t promise byte-identical output: timestamps, aggregation order, or floating-point rounding can still vary at runtime.)

These are knowable before execution. An agent can decide whether an entry is the one it needs by reading metadata, without running anything.

Why a git repo, not a service

A Xorq catalog is a git repository of build artifacts on the filesystem— nothing more. There is no server to call and no API to learn. That choice is deliberate:

Discovery is file operations. Metadata lives in metadata/*.yaml sidecars next to the zipped entries. Listing, filtering, and lookup-by-alias or by-hash are ls, grep, and git—tools every agent already has. An agent that clones the repo can find everything without learning anything new.
History is the audit log. Every catalog operation is a commit. git log and git reflog tell you exactly when an entry was added, aliased, or removed, and by whom—provenance you get for free from git rather than building separately.
Sharing is git push / git clone. Collaboration is the pull-request workflow teams already run. A teammate clones the catalog, adds an entry on a branch, opens a PR; the merge is ordinary. A catalog supports at most one git remote, which keeps “where’s the source of truth” unambiguous. Large files are handled by an opt-in git-annex remote rather than bloating the repo.
No vendor lock-in. A vector-database memory store keeps your data inside a proprietary schema; getting it out means an export job and a migration plan. A catalog is files, so migrating is git clone or an rsync of the directory. Nothing is trapped behind an API.

The catalog is portable for the same reason any git repo is: it’s just files.

Indexes are derived, not the source of truth

A vector store inverts the usual relationship: the index is the memory, and the original text is something you hope was preserved alongside the embeddings. A catalog keeps the source of truth on disk—the entries and their YAML metadata—and treats any richer lookup as a layer built on top. The plain-text sidecars already make keyword search a grep; nothing stops you from building a BM25 or vector index over the same metadata when you want fuzzy recall. Because those indexes are derived, you can rebuild or throw them away without losing anything, and a stale index can never disagree with the artifact it points to.

What’s on disk

Important

A catalog uses symlinks for aliases, so it expects a POSIX filesystem that preserves them. On Windows, run catalog operations under WSL—a native Windows filesystem without symlink support produces a broken catalog.

After catalog init the repo holds a single manifest:

# catalog.yaml
entries: []
aliases: []

catalog add packages a build directory—the manifest (expr.yaml plus *_metadata.json) and the uv-pinned environment—into an entry and records it. A populated catalog looks like:

git-catalogs/penguins
├── aliases
│   └── penguins-agg.zip -> ../entries/fa2122f6a9e9.zip
├── entries
│   └── fa2122f6a9e9.zip
├── metadata
│   └── fa2122f6a9e9.zip.metadata.yaml
└── catalog.yaml

Entries are zipped builds named by their content hash. Aliases are symlinks pointing at an entry—a human-readable name (penguins-agg) for a hash (fa2122f6a9e9). The symlink on disk carries the .zip suffix of its target (penguins-agg.zip), so a catalog expects a POSIX filesystem that preserves symlinks. Metadata sidecars are plain YAML, which is what makes discovery a grep:

# Entries that emit an 'avg_bill_length' column
grep -l 'avg_bill_length' git-catalogs/penguins/metadata/*.yaml

# Entries running on DataFusion
grep -l 'xorq_datafusion' git-catalogs/penguins/metadata/*.yaml

Why content-addressing matters

Entries are named by a hash of the computation, not by a timestamp or a user-chosen filename. Two consequences fall out of that:

Reuse is automatic. Building the same pipeline twice produces the same hash, so the second build is a no-op—the entry already exists. There is no “did someone already compute this?” question to answer by hand. Prose and vector memory have to dedupe explicitly—comparing embeddings, running similarity heuristics—because two notes describing the same fact look different on disk. Content-addressing makes that question disappear: identical computations collapse to one entry by construction.
Identity is honest. If two entries have the same hash they are the same computation; if the hashes differ, something in the graph or its inputs changed. The name can’t lie about what the artifact does, the way a filename like final_v2_fixed.py can.

Aliases sit on top of this: they give a stable name to “whatever the current best version is” while the underlying hash moves as the pipeline evolves. The hash is identity; the alias is intent.

Memory you can run

A prose note says what someone once did. A catalog entry is the thing, ready to run, with its schema, lineage, and environment attached. That’s the difference between memory you have to trust and memory you can execute—and it’s why the catalog, not a pile of scripts, is where Xorq keeps what an agent learns.

What a memory entry is

Why a git repo, not a service

Indexes are derived, not the source of truth

What’s on disk

Why content-addressing matters

Memory you can run

See also