Catalog

Catalog()

A git-backed registry for versioned build artifacts.

A catalog is a git repository containing serialized xorq expressions as content-addressed zip archives. When backed by git-annex, cloning downloads only metadata and artifact content is fetched on demand. A plain-git backend stores archives as regular blobs.

Construct via the classmethods from_name, from_repo_path, from_default, clone_from, or the dispatch helper from_kwargs.

Attributes

Name Description
remote_config The resolved remote config, or None.

Methods

Name Description
add Add a build to the catalog.
add_alias Create an alias pointing at entry name. Overwrites if the alias already exists.
assert_consistency Verify that catalog.yaml, entries, metadata, and aliases are all in agreement.
bind Bind a source entry through one or more transform entries.
clone_from Clone a catalog repo and optionally init git-annex.
contains Return True if an entry with name exists in the catalog.
embed_readonly Embed read-only credentials into the git-annex branch.
fetch Fetch from the configured git remote (no-op if no remote is configured).
fetch_entries Fetch annex content for the given entries in a single operation.
get_catalog_entry Look up a CatalogEntry by name. Raises if not found.
get_zip Export an entry’s archive to dir_path (default: cwd). Returns the output path.
list Return the list of entry names in the catalog.
list_aliases Return the list of alias names in the catalog.
load Return a tagged RemoteTable expression for a catalog entry (by hash or alias).
pull Fetch and merge from the catalog’s git remote; raise on unmerged paths.
push Push to the configured git remote after verifying consistency.
remove Remove an entry (and its aliases) from the catalog by name.
set_remote Configure the catalog’s git remote.
set_remote_config Update the git-annex special remote configuration.
sync Pull then push — shorthand for a full round-trip synchronization.

add

add(obj, sync=True, aliases=(), exist_ok=False, project_path=None)

Add a build to the catalog.

obj may be a Path to a zip archive, a Path to a build directory, or an xorq Expr. Returns the created CatalogEntry.

project_path is the directory containing the pyproject.toml used to build the wheel and requirements sidecars. If omitted, the packager walks upward from the current working directory to find one. Passing it explicitly is required when the caller’s cwd is not inside the project (e.g. Jupyter kernels started from /tmp). Ignored for zip inputs, which are already complete build archives.

add_alias

add_alias(name, alias, sync=True)

Create an alias pointing at entry name. Overwrites if the alias already exists.

assert_consistency

assert_consistency()

Verify that catalog.yaml, entries, metadata, and aliases are all in agreement.

bind

bind(source_entry, *transforms, con=None)

Bind a source entry through one or more transform entries.

clone_from

clone_from(
    url,
    repo_path=None,
    check_consistency=True,
    annex=None,
    git_config=None,
    **remote_kwargs,
)

Clone a catalog repo and optionally init git-annex.

annex controls the backend:

  • None (default) — auto-detect. If the cloned repo has a git-annex branch, git-annex is initialised and the remote is enabled when credentials are available (embedded, env vars, or remote_kwargs). Otherwise falls back to plain git.
  • False — force plain git, even if the repo has a git-annex branch.
  • Any AnnexConfig instance — git-annex is initialised and the remote is enabled if remote.log has a special remote configured.

Content is not fetched eagerly; it is retrieved on demand when entry.expr is accessed (via fetch_content). For S3 remotes without embedded credentials, the caller can supply credentials via remote_kwargs or environment variables (XORQ_CATALOG_S3_*).

Use git_config to set repo-local git config before annex init (e.g. {"annex.security.allowed-ip-addresses": "all"}).

contains

contains(name)

Return True if an entry with name exists in the catalog.

embed_readonly

embed_readonly(readonly_config)

Embed read-only credentials into the git-annex branch.

Verifies that readonly_config cannot write to the bucket, then sets embedcreds=yes and writes the config to remote.log.

Raises ValueError if the credentials have write access.

fetch

fetch()

Fetch from the configured git remote (no-op if no remote is configured).

fetch_entries

fetch_entries(*entries)

Fetch annex content for the given entries in a single operation.

Each element can be a CatalogEntry or a string (entry name). No-op for plain-git backends.

get_catalog_entry

get_catalog_entry(name, maybe_alias=False)

Look up a CatalogEntry by name. Raises if not found.

get_zip

get_zip(name, dir_path=None)

Export an entry’s archive to dir_path (default: cwd). Returns the output path.

list

list()

Return the list of entry names in the catalog.

list_aliases

list_aliases()

Return the list of alias names in the catalog.

load

load(name_or_alias, con=None)

Return a tagged RemoteTable expression for a catalog entry (by hash or alias).

pull

pull()

Fetch and merge from the catalog’s git remote; raise on unmerged paths.

Replaces git pull (which inherits the user’s pull.rebase config and bails on divergent branches by default) with explicit git fetch + git merge. When the merge leaves catalog.yaml conflicted (typical when both sides appended to the entries or aliases lists), a Python 3-way list-merge resolves it: items present in the merge base and removed by one side are propagated as removals; items added by either side survive; duplicates are collapsed. Anything still unmerged after that — typically alias symlinks at the same path with diverging targets — surfaces as CatalogMergeConflict with the conflicted paths and the remote name; the merge is left in-progress so the user can resolve it (see CatalogMergeConflict for recovery recipes).

Pre-flights:

  • HEAD must be on a branch (the catalog API never detaches HEAD on its own — this only fails if the repo was put in detached state outside xorq). Raises CatalogPullError.
  • catalog.yaml in both ours (HEAD) and the remote tip must exist, parse, and have the expected dict-or-list shape. The resolver assumes well-formed input on both sides; without this check, a catalog.yaml deleted on the remote tip would be silently treated as “theirs removed every entry” and the 3-way list merge would drop every prior entry, while a malformed or scalar-shaped yaml would leak a bare ValueError / AttributeError from inside the resolver. Raises CatalogPullError naming the corrupt side.
  • A non-conflict git merge failure (e.g. the remote ref doesn’t exist, the working tree is dirty, a hook rejected the merge commit) re-raises the original GitCommandError rather than swallowing it and falling through to a misleading git commit --no-edit.

A catalog has at most one git remote (see ADR on single-remote catalogs). No remote → no-op.

push

push()

Push to the configured git remote after verifying consistency.

Pushes main, then git-annex (if present). Both pushes are always attempted — raises a single CatalogPushError listing every rejection or transport failure across both. No-op when no git remote is configured.

Returns (), (main_result,), or (main_result, annex_result).

remove

remove(name, sync=True)

Remove an entry (and its aliases) from the catalog by name.

set_remote

set_remote(name, url, force=False)

Configure the catalog’s git remote.

The catalog supports at most one git remote (ADR-0011). When the repo has no git remote, set_remote creates one with the given name and url and returns it.

When a git remote is already configured, set_remote raises CatalogConfigurationError unless force=True is passed. The guard exists because silent replacement turns a typo in the remote name into the deletion of the existing remote with no signal — failing by default forces explicit opt-in. With force=True, every existing git remote is deleted and replaced.

set_remote_config

set_remote_config(remote_config)

Update the git-annex special remote configuration.

Calls enableremote to write the config to remote.log on the git-annex branch. Use catalog.remote_config to read it back.

sync

sync()

Pull then push — shorthand for a full round-trip synchronization.