import xorq.api as xo
# Fetch an example dataset as a table expression
t = xo.examples.iris.fetch()Pins: versioned data artifacts
Xorq uses pins to provide named, versioned access to shared datasets, trained models, and code modules. Instead of shipping large files in the repository or requiring manual downloads, artifacts are stored in a public cloud bucket and accessed by name.
Overview
The pins system in Xorq allows you to:
- Access shared datasets by name without managing file paths or URLs
- Load pre-trained ML models for use in pipelines
- Pin specific versions of artifacts for reproducible pipelines
- Cache downloads locally so repeated access is fast
Quick start
xo.examples.<name>.fetch() downloads the artifact via pins, reads it into a table expression on the default backend, and caches the download locally. Subsequent calls use the cached copy.
How it works
Xorq wraps the pins Python library with a preconfigured connection to a public GCS bucket (letsql-pins). No authentication is required.
The configuration lives in xo.options.pins:
| Setting | Default | Description |
|---|---|---|
protocol |
"gcs" |
Storage protocol |
path |
"letsql-pins" |
GCS bucket name |
storage_options |
{"cache_timeout": 0, "token": "anon"} |
Anonymous access; always check remote freshness |
get_path(name, board=None, **kwargs)
Downloads the named pin and returns a local path. Extra keyword arguments (such as version) are forwarded to the underlying pins library via board.pin_download.
# Get the latest version
path = xo.options.pins.get_path("diamonds")
# Pin a specific version for reproducibility
path = xo.options.pins.get_path("hackernews_lib", version="20250820T111457Z-1d66a")Once you have a local path, hand it to any backend reader and execute:
con = xo.connect()
t = con.read_parquet(xo.options.pins.get_path("batting"))
t.execute()| playerID | yearID | stint | teamID | lgID | G | AB | R | H | X2B | ... | RBI | SB | CS | BB | SO | IBB | HBP | SH | SF | GIDP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | abercda01 | 1871 | 1 | TRO | NA | 1 | 4.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN |
| 1 | addybo01 | 1871 | 1 | RC1 | NA | 25 | 118.0 | 30.0 | 32.0 | 6.0 | ... | 13.0 | 8.0 | 1.0 | 4.0 | 0.0 | NaN | NaN | NaN | NaN | NaN |
| 2 | allisar01 | 1871 | 1 | CL1 | NA | 29 | 137.0 | 28.0 | 40.0 | 4.0 | ... | 19.0 | 3.0 | 1.0 | 2.0 | 5.0 | NaN | NaN | NaN | NaN | NaN |
| 3 | allisdo01 | 1871 | 1 | WS3 | NA | 27 | 133.0 | 28.0 | 44.0 | 10.0 | ... | 27.0 | 1.0 | 1.0 | 0.0 | 2.0 | NaN | NaN | NaN | NaN | NaN |
| 4 | ansonca01 | 1871 | 1 | RC1 | NA | 25 | 120.0 | 29.0 | 39.0 | 11.0 | ... | 16.0 | 6.0 | 2.0 | 2.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 101327 | zitoba01 | 2015 | 1 | OAK | AL | 3 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 101328 | zobribe01 | 2015 | 1 | OAK | AL | 67 | 235.0 | 39.0 | 63.0 | 20.0 | ... | 33.0 | 1.0 | 1.0 | 33.0 | 26.0 | 2.0 | 0.0 | 0.0 | 3.0 | 5.0 |
| 101329 | zobribe01 | 2015 | 2 | KCA | AL | 59 | 232.0 | 37.0 | 66.0 | 16.0 | ... | 23.0 | 2.0 | 3.0 | 29.0 | 30.0 | 1.0 | 1.0 | 0.0 | 2.0 | 3.0 |
| 101330 | zuninmi01 | 2015 | 1 | SEA | AL | 112 | 350.0 | 28.0 | 61.0 | 11.0 | ... | 28.0 | 0.0 | 1.0 | 21.0 | 132.0 | 0.0 | 5.0 | 8.0 | 2.0 | 6.0 |
| 101331 | zychto01 | 2015 | 1 | SEA | AL | 13 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
101332 rows × 22 columns
get_board()
Returns the underlying pins.board object for advanced operations like listing available pins or reading metadata.
board = xo.options.pins.get_board()
board.pin_list() # list all available pins
board.pin_meta("iris") # get metadata for a pinMeta(title='iris: a pinned 150 x 5 DataFrame', description=None, created='20240716T095120Z', pin_hash='d77c9966e54405d9', file='iris.csv', file_size=3858, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 7, 16, 9, 51, 20), hash='d77c9'), tags=None, name='iris', user={}, local={})
Available pins
| Pin | Format | Description |
|---|---|---|
"iris" |
CSV | Classic iris dataset |
"diamonds" |
Parquet | Diamonds pricing dataset |
"penguins" |
Parquet | Palmer penguins dataset |
"batting" |
Parquet | Baseball batting statistics |
"lending-club" |
Parquet | Lending Club loan data |
"bank-marketing" |
CSV | Bank marketing dataset |
"hn-fetcher-input-small.parquet" |
Parquet | HackerNews sample data |
"hn_tfidf_fitted_model" |
Binary | Pre-trained TF-IDF model |
"hn_sentiment_reg" |
Binary | Pre-trained XGBoost sentiment model |
"hackernews_lib" |
Python module | HackerNews pipeline code (versioned) |
"diamonds-model" |
JSON | XGBoost model for predicting diamond price |
Common patterns
Loading datasets
The preferred way to load example datasets is xo.examples.<name>.fetch():
import xorq.api as xo
# Fetch with default backend
t = xo.examples.diamonds.fetch()
# Fetch with a specific backend
con = xo.connect()
t = xo.examples.diamonds.fetch(backend=con)This calls xo.options.pins.get_path("diamonds") under the hood, reads the file with the appropriate method based on format, and returns a table expression.
Loading ML models
Pre-trained models are pinned as binary artifacts and loaded by path:
import pathlib
TFIDF_MODEL_PATH = pathlib.Path(
xo.options.pins.get_path("hn_tfidf_fitted_model")
)
XGB_MODEL_PATH = pathlib.Path(
xo.options.pins.get_path("hn_sentiment_reg")
)Loading versioned code modules
Python modules can be pinned and loaded with a specific version, ensuring pipeline reproducibility:
from xorq.common.utils.import_utils import import_python
m = import_python(
xo.options.pins.get_path("hackernews_lib", version="20250820T111457Z-1d66a")
)
# m is now a module with functions defined in the pinned file
# Everything the pinned module brought into the namespace:
print([name for name in dir(m) if not name.startswith("_")])['Path', 'base_api_url', 'curry', 'do_hackernews_fetcher_udxf', 'functools', 'get_hackernews_item', 'get_hackernews_maxitem', 'get_hackernews_stories', 'get_hackernews_stories_batch', 'get_json', 'json', 'pd', 'requests', 'schema_in', 'schema_out', 'simple_disk_cache', 'toolz', 'xo']
Caching behavior
The pins library caches downloads automatically in a platform-specific cache directory (typically ~/.cache/pins/ on Linux). The cache_timeout: 0 setting in Xorq’s default configuration means the library always checks if the remote version has changed, but serves from cache if it hasn’t.