Catalog
Catalog()A git-backed registry for versioned build artifacts.
A catalog is a git repository containing serialized xorq expressions as content-addressed zip archives. When backed by git-annex, cloning downloads only metadata and artifact content is fetched on demand. A plain-git backend stores archives as regular blobs.
Construct via the classmethods from_name, from_repo_path, from_default, clone_from, or the dispatch helper from_kwargs.
Attributes
| Name | Description |
|---|---|
| remote_config | The resolved remote config, or None. |
Methods
| Name | Description |
|---|---|
| add | Add a build to the catalog. |
| add_alias | Create an alias pointing at entry name. Overwrites if the alias already exists. |
| assert_consistency | Verify that catalog.yaml, entries, metadata, and aliases are all in agreement. |
| bind | Bind a source entry through one or more transform entries. |
| clone_from | Clone a catalog repo and optionally init git-annex. |
| contains | Return True if an entry with name exists in the catalog. |
| embed_readonly | Embed read-only credentials into the git-annex branch. |
| fetch | Fetch from the configured git remote (no-op if no remote is configured). |
| fetch_entries | Fetch annex content for the given entries in a single operation. |
| get_catalog_entry | Look up a CatalogEntry by name. Raises if not found. |
| get_zip | Export an entry’s archive to dir_path (default: cwd). Returns the output path. |
| list | Return the list of entry names in the catalog. |
| list_aliases | Return the list of alias names in the catalog. |
| load | Return a tagged RemoteTable expression for a catalog entry (by hash or alias). |
| pull | Fetch and merge from the catalog’s git remote; raise on unmerged paths. |
| push | Push to the configured git remote after verifying consistency. |
| remove | Remove an entry (and its aliases) from the catalog by name. |
| set_remote | Configure the catalog’s git remote. |
| set_remote_config | Update the git-annex special remote configuration. |
| sync | Pull then push — shorthand for a full round-trip synchronization. |
add
add(obj, sync=True, aliases=(), exist_ok=False, project_path=None)Add a build to the catalog.
obj may be a Path to a zip archive, a Path to a build directory, or an xorq Expr. Returns the created CatalogEntry.
project_path is the directory containing the pyproject.toml used to build the wheel and requirements sidecars. If omitted, the packager walks upward from the current working directory to find one. Passing it explicitly is required when the caller’s cwd is not inside the project (e.g. Jupyter kernels started from /tmp). Ignored for zip inputs, which are already complete build archives.
add_alias
add_alias(name, alias, sync=True)Create an alias pointing at entry name. Overwrites if the alias already exists.
assert_consistency
assert_consistency()Verify that catalog.yaml, entries, metadata, and aliases are all in agreement.
bind
bind(source_entry, *transforms, con=None)Bind a source entry through one or more transform entries.
clone_from
clone_from(
url,
repo_path=None,
check_consistency=True,
annex=None,
git_config=None,
**remote_kwargs,
)Clone a catalog repo and optionally init git-annex.
annex controls the backend:
None(default) — auto-detect. If the cloned repo has agit-annexbranch, git-annex is initialised and the remote is enabled when credentials are available (embedded, env vars, or remote_kwargs). Otherwise falls back to plain git.False— force plain git, even if the repo has agit-annexbranch.- Any
AnnexConfiginstance — git-annex is initialised and the remote is enabled if remote.log has a special remote configured.
Content is not fetched eagerly; it is retrieved on demand when entry.expr is accessed (via fetch_content). For S3 remotes without embedded credentials, the caller can supply credentials via remote_kwargs or environment variables (XORQ_CATALOG_S3_*).
Use git_config to set repo-local git config before annex init (e.g. {"annex.security.allowed-ip-addresses": "all"}).
contains
contains(name)Return True if an entry with name exists in the catalog.
embed_readonly
embed_readonly(readonly_config)Embed read-only credentials into the git-annex branch.
Verifies that readonly_config cannot write to the bucket, then sets embedcreds=yes and writes the config to remote.log.
Raises ValueError if the credentials have write access.
fetch
fetch()Fetch from the configured git remote (no-op if no remote is configured).
fetch_entries
fetch_entries(*entries)Fetch annex content for the given entries in a single operation.
Each element can be a CatalogEntry or a string (entry name). No-op for plain-git backends.
get_catalog_entry
get_catalog_entry(name, maybe_alias=False)Look up a CatalogEntry by name. Raises if not found.
get_zip
get_zip(name, dir_path=None)Export an entry’s archive to dir_path (default: cwd). Returns the output path.
list
list()Return the list of entry names in the catalog.
list_aliases
list_aliases()Return the list of alias names in the catalog.
load
load(name_or_alias, con=None)Return a tagged RemoteTable expression for a catalog entry (by hash or alias).
pull
pull()Fetch and merge from the catalog’s git remote; raise on unmerged paths.
Replaces git pull (which inherits the user’s pull.rebase config and bails on divergent branches by default) with explicit git fetch + git merge. When the merge leaves catalog.yaml conflicted (typical when both sides appended to the entries or aliases lists), a Python 3-way list-merge resolves it: items present in the merge base and removed by one side are propagated as removals; items added by either side survive; duplicates are collapsed. Anything still unmerged after that — typically alias symlinks at the same path with diverging targets — surfaces as CatalogMergeConflict with the conflicted paths and the remote name; the merge is left in-progress so the user can resolve it (see CatalogMergeConflict for recovery recipes).
Pre-flights:
- HEAD must be on a branch (the catalog API never detaches HEAD on its own — this only fails if the repo was put in detached state outside xorq). Raises
CatalogPullError. catalog.yamlin both ours (HEAD) and the remote tip must exist, parse, and have the expected dict-or-list shape. The resolver assumes well-formed input on both sides; without this check, acatalog.yamldeleted on the remote tip would be silently treated as “theirs removed every entry” and the 3-way list merge would drop every prior entry, while a malformed or scalar-shaped yaml would leak a bareValueError/AttributeErrorfrom inside the resolver. RaisesCatalogPullErrornaming the corrupt side.- A non-conflict
git mergefailure (e.g. the remote ref doesn’t exist, the working tree is dirty, a hook rejected the merge commit) re-raises the originalGitCommandErrorrather than swallowing it and falling through to a misleadinggit commit --no-edit.
A catalog has at most one git remote (see ADR on single-remote catalogs). No remote → no-op.
push
push()Push to the configured git remote after verifying consistency.
Pushes main, then git-annex (if present). Both pushes are always attempted — raises a single CatalogPushError listing every rejection or transport failure across both. No-op when no git remote is configured.
Returns (), (main_result,), or (main_result, annex_result).
remove
remove(name, sync=True)Remove an entry (and its aliases) from the catalog by name.
set_remote
set_remote(name, url, force=False)Configure the catalog’s git remote.
The catalog supports at most one git remote (ADR-0011). When the repo has no git remote, set_remote creates one with the given name and url and returns it.
When a git remote is already configured, set_remote raises CatalogConfigurationError unless force=True is passed. The guard exists because silent replacement turns a typo in the remote name into the deletion of the existing remote with no signal — failing by default forces explicit opt-in. With force=True, every existing git remote is deleted and replaced.
set_remote_config
set_remote_config(remote_config)Update the git-annex special remote configuration.
Calls enableremote to write the config to remote.log on the git-annex branch. Use catalog.remote_config to read it back.
sync
sync()Pull then push — shorthand for a full round-trip synchronization.