# xorq > Data processing library built on top of Ibis and DataFusion to write multi-engine data workflows. ## Docs ### API Reference #### Core Operations > APIs for reading and returning data - [connect](https://docs.xorq.dev/reference/connect.html): Create a xorq backend - [execute](https://docs.xorq.dev/reference/execute.html): Execute an expression against its backend if one exists - [memtable](https://docs.xorq.dev/reference/memtable.html): Construct an ibis table expression from in-memory data - [deferred_read_csv](https://docs.xorq.dev/reference/deferred_read_csv.html): Create a deferred read operation for CSV files that will execute only when needed - [deferred_read_parquet](https://docs.xorq.dev/reference/deferred_read_parquet.html): Create a deferred read operation for Parquet files that will execute only when needed - [to_csv](https://docs.xorq.dev/reference/to_csv.html): Write the results of executing the given expression to a CSV file - [to_json](https://docs.xorq.dev/reference/to_json.html): Write the results of `expr` to a NDJSON file - [to_parquet](https://docs.xorq.dev/reference/to_parquet.html): Write the results of executing the given expression to a parquet file - [to_pyarrow](https://docs.xorq.dev/reference/to_pyarrow.html): Execute expression and return results in as a pyarrow table - [to_pyarrow_batches](https://docs.xorq.dev/reference/to_pyarrow_batches.html): Execute expression and return a RecordBatchReader - [to_sql](https://docs.xorq.dev/reference/to_sql.html): Return the formatted SQL string for an expression - [register](https://docs.xorq.dev/reference/register.html) - [get_plans](https://docs.xorq.dev/reference/get_plans.html) #### Data Operations - [Table](https://docs.xorq.dev/reference/Table.html): An immutable and lazy dataframe - [GroupedTable](https://docs.xorq.dev/reference/GroupedTable.html): An intermediate table expression to hold grouping information - [Value](https://docs.xorq.dev/reference/Value.html): Base class for a data generating expression having a known type - [Scalar](https://docs.xorq.dev/reference/Scalar.html): Base class for a data generating expression having a known type - [Column](https://docs.xorq.dev/reference/Column.html): Base class for a data generating expression having a known type - [NumericColumn](https://docs.xorq.dev/reference/NumericColumn.html): Base class for a data generating expression having a known type - [IntegerColumn](https://docs.xorq.dev/reference/IntegerColumn.html): Base class for a data generating expression having a known type - [FloatingColumn](https://docs.xorq.dev/reference/FloatingColumn.html): Base class for a data generating expression having a known type - [StringValue](https://docs.xorq.dev/reference/StringValue.html): Base class for a data generating expression having a known type - [TimeValue](https://docs.xorq.dev/reference/TimeValue.html): Temporal expressions that have a time component - [DateValue](https://docs.xorq.dev/reference/DateValue.html): Base class for a data generating expression having a known type - [DayOfWeek](https://docs.xorq.dev/reference/DayOfWeek.html): A namespace of methods for extracting day of week information - [TimestampValue](https://docs.xorq.dev/reference/TimestampValue.html): Temporal expressions that have a date component - [IntervalValue](https://docs.xorq.dev/reference/IntervalValue.html): Base class for a data generating expression having a known type #### Caching > Caching - [ParquetCache](https://docs.xorq.dev/reference/ParquetCache.html): Cache expression results as Parquet files, re-hashing when source data changes - [ParquetSnapshotCache](https://docs.xorq.dev/reference/ParquetSnapshotCache.html): Cache expression results as Parquet files with a stable, snapshot key - [SourceCache](https://docs.xorq.dev/reference/SourceCache.html): Cache expression results as a table in a source backend, with automatic invalidation - [SourceSnapshotCache](https://docs.xorq.dev/reference/SourceSnapshotCache.html): Cache expression results as a table in a source backend, with a stable key #### Window and Selectors > Window functions and column selectors - [window](https://docs.xorq.dev/reference/window.html): Create a window clause for use with window functions - [selectors](https://docs.xorq.dev/reference/selectors.html): Convenient column selectors #### Machine Learning Operations > Machine Learning Functions and Helpers - [train_test_splits](https://docs.xorq.dev/reference/train_test_splits.html): Generates multiple train/test splits of an Ibis table for different test sizes - [Step](https://docs.xorq.dev/reference/Step.html): A single step in a machine learning pipeline that wraps a scikit-learn estimator - [Pipeline](https://docs.xorq.dev/reference/Pipeline.html): A machine learning pipeline that chains multiple processing steps together - [FittedPipeline](https://docs.xorq.dev/reference/FittedPipeline.html) - [deferred_fit_predict](https://docs.xorq.dev/reference/deferred_fit_predict.html) - [deferred_fit_transform](https://docs.xorq.dev/reference/deferred_fit_transform.html) - [calc_split_column](https://docs.xorq.dev/reference/calc_split_column.html): Parameters #### Lineage > Data lineage tracking utilities - [build_column_trees](https://docs.xorq.dev/reference/build_column_trees.html): Builds a lineage tree for each column in the expression - [build_tree](https://docs.xorq.dev/reference/build_tree.html) #### Flight Operations > Apache Arrow Flight server and client operations - [FlightServer](https://docs.xorq.dev/reference/FlightServer.html) - [FlightUrl](https://docs.xorq.dev/reference/FlightUrl.html) - [make_udxf](https://docs.xorq.dev/reference/make_udxf.html) #### Catalog Operations > Compute catalog management - [Catalog](https://docs.xorq.dev/reference/Catalog.html): A git-backed registry for versioned build artifacts #### Type System > Data types and schemas - [Data types](https://docs.xorq.dev/reference/datatypes.html): Scalar and column data types - [Schemas](https://docs.xorq.dev/reference/schemas.html): Table Schemas #### UDF System > The functions for creating UDF - [make_pandas_udf](https://docs.xorq.dev/reference/make_pandas_udf.html): Create a scalar User-Defined Function (UDF) that operates on pandas DataFrames - [make_pandas_expr_udf](https://docs.xorq.dev/reference/make_pandas_expr_udf.html): Create an expression-based scalar UDF that incorporates pre-computed values - [pyarrow_udwf](https://docs.xorq.dev/reference/pyarrow_udwf.html): Create a User-Defined Window Function (UDWF) using PyArrow - [agg.pyarrow](https://docs.xorq.dev/reference/agg.pyarrow.html): Decorator for creating PyArrow-based aggregation functions - [agg.pandas_df](https://docs.xorq.dev/reference/agg.pandas_df.html): Create a pandas DataFrame-based aggregation function - [flight_udxf](https://docs.xorq.dev/reference/flight_udxf.html): Create a User-Defined Exchange Function (UDXF) that executes a pandas DataFrame ### CLI Reference - [xorq init](https://docs.xorq.dev/api_reference/cli/init.html): Scaffold a new Xorq project from a template. - [xorq completion](https://docs.xorq.dev/api_reference/cli/completion.html): Print a shell-completion script to stdout. - [xorq install-completion](https://docs.xorq.dev/api_reference/cli/install-completion.html): Install the shell-completion script to the standard location. - [xorq build](https://docs.xorq.dev/api_reference/cli/build.html): Compile a Xorq expression into a reusable build artifact. - [xorq uv build](https://docs.xorq.dev/api_reference/cli/uv-build.html): Build an expression inside a uv-managed isolated environment. - [xorq run](https://docs.xorq.dev/api_reference/cli/run.html): Execute a build artifact and write results in your chosen format. - [xorq uv run](https://docs.xorq.dev/api_reference/cli/uv-run.html): Execute a build inside a uv-managed isolated environment. - [xorq run-cached](https://docs.xorq.dev/api_reference/cli/run-cached.html): Run a build with a parquet cache wrapping the expression. - [xorq uv run-cached](https://docs.xorq.dev/api_reference/cli/uv-run-cached.html): Run a build with a parquet cache inside a uv-managed environment. - [xorq run-unbound](https://docs.xorq.dev/api_reference/cli/run-unbound.html): Run an unbound expression by streaming Arrow IPC input. - [xorq uv run-unbound](https://docs.xorq.dev/api_reference/cli/uv-run-unbound.html): Run an unbound expression over Arrow IPC inside a uv-managed environment. - [xorq serve-flight-udxf](https://docs.xorq.dev/api_reference/cli/serve-flight-udxf.html): Serve an expression's UDXF nodes as an Arrow Flight endpoint. - [xorq serve-unbound](https://docs.xorq.dev/api_reference/cli/serve-unbound.html): Serve an unbound expression as an Arrow Flight endpoint. - [xorq catalog init](https://docs.xorq.dev/api_reference/cli/catalog/init.html): Create a new catalog repository. - [xorq catalog clone](https://docs.xorq.dev/api_reference/cli/catalog/clone.html): Clone an existing catalog from a remote URL. - [xorq catalog info](https://docs.xorq.dev/api_reference/cli/catalog/info.html): Show catalog summary metadata. - [xorq catalog default](https://docs.xorq.dev/api_reference/cli/catalog/default.html): Show or change the persisted default catalog name. - [xorq catalog tui](https://docs.xorq.dev/api_reference/cli/catalog/tui.html): Browse the catalog interactively in a terminal UI. - [xorq catalog add](https://docs.xorq.dev/api_reference/cli/catalog/add.html): Add entries from build directories or archive files. - [xorq catalog remove](https://docs.xorq.dev/api_reference/cli/catalog/remove.html): Remove entries by name. - [xorq catalog list](https://docs.xorq.dev/api_reference/cli/catalog/list.html): List all entries. - [xorq catalog show](https://docs.xorq.dev/api_reference/cli/catalog/show.html): Show full metadata for a catalog entry. - [xorq catalog schema](https://docs.xorq.dev/api_reference/cli/catalog/schema.html): Show the schemas for a catalog entry. - [xorq catalog get](https://docs.xorq.dev/api_reference/cli/catalog/get.html): Export an entry's archive to a directory. - [xorq catalog add-alias](https://docs.xorq.dev/api_reference/cli/catalog/add-alias.html): Attach an alias to an existing entry. - [xorq catalog remove-alias](https://docs.xorq.dev/api_reference/cli/catalog/remove-alias.html): Remove one or more aliases. - [xorq catalog list-aliases](https://docs.xorq.dev/api_reference/cli/catalog/list-aliases.html): List all aliases. - [xorq catalog compose](https://docs.xorq.dev/api_reference/cli/catalog/compose.html): Compose entries into a new expression and persist it to the catalog. - [xorq catalog run](https://docs.xorq.dev/api_reference/cli/catalog/run.html): Compose and execute catalog entries, writing data to disk or stdout. - [xorq catalog run-cached](https://docs.xorq.dev/api_reference/cli/catalog/run-cached.html): Compose and execute catalog entries with a parquet cache wrapper. - [xorq catalog serve-unbound](https://docs.xorq.dev/api_reference/cli/catalog/serve-unbound.html): Resolve a catalog entry, unbind a node, and serve it via Flight. - [xorq catalog push](https://docs.xorq.dev/api_reference/cli/catalog/push.html): Push the catalog's commits and annex content to its remotes. - [xorq catalog pull](https://docs.xorq.dev/api_reference/cli/catalog/pull.html): Pull the catalog's commits and annex content from its remotes. - [xorq catalog sync](https://docs.xorq.dev/api_reference/cli/catalog/sync.html): Pull then push. - [xorq catalog set-remote](https://docs.xorq.dev/api_reference/cli/catalog/set-remote.html): Configure the catalog's git remote. - [xorq catalog embed-readonly](https://docs.xorq.dev/api_reference/cli/catalog/embed-readonly.html): Embed read-only S3 credentials into the catalog's git-annex branch. - [xorq catalog check](https://docs.xorq.dev/api_reference/cli/catalog/check.html): Validate the catalog for consistency. - [xorq catalog gc](https://docs.xorq.dev/api_reference/cli/catalog/gc.html): Remove orphaned content store objects (pointer backend only). - [xorq catalog log](https://docs.xorq.dev/api_reference/cli/catalog/log.html): Show the catalog's history as structured operations. - [xorq catalog replay](https://docs.xorq.dev/api_reference/cli/catalog/replay.html): Replay the catalog's operation log into a target catalog. ### Guides #### Get started - [Quickstart](https://docs.xorq.dev/getting_started/quickstart.html): Build and run your first Xorq ML pipeline in under five minutes - [Defer query execution](https://docs.xorq.dev/getting_started/understand_deferred_execution.html): Learn when Xorq builds expressions versus when it runs computation - [Cache expression results](https://docs.xorq.dev/getting_started/explore_caching.html): This tutorial shows you how Xorq's caching system works through hands-on examples - [Switch between backends](https://docs.xorq.dev/getting_started/switch_backends.html): This tutorial shows you how to run the same expression on different execution engines - [Your first expression](https://docs.xorq.dev/getting_started/your_first_expression.html): This tutorial shows you how to write and run your first Xorq expression - [Claude Code plugin](https://docs.xorq.dev/claude/index.html): Drive the Xorq catalog and command-line tool from Claude Code in plain language #### How-to guides - [Install Xorq](https://docs.xorq.dev/how_to/install_xorq.html): Install Xorq, verify the setup works, and add optional backend extras - [Connect to a backend](https://docs.xorq.dev/how_to/connect_to_backends.html): Create a connection to the embedded backend, DuckDB, PostgreSQL, Snowflake, Trino, or SQLite - [Cache results by backend](https://docs.xorq.dev/how_to/cache_by_backend.html): Pick the right cache class for your source and target backends - [Route a step to a specific backend](https://docs.xorq.dev/how_to/switch_backends.html): Use into_backend() to run each pipeline step on the engine that suits it - [Compose catalog entries](https://docs.xorq.dev/how_to/compose_catalog_entries.html): Load an existing catalog entry, build a new pipeline on top, and register the result back #### Tutorials - [Get started with Xorq](https://docs.xorq.dev/tutorials/core_tutorials/get-started-with-xorq-init.html): Scaffold a baseball analytics pipeline in minutes using init, deferred_read_csv, and build commands - [Your first build](https://docs.xorq.dev/tutorials/core_tutorials/your_first_build.html): Create portable, versioned artifacts from your Xorq pipelines - [Build a semantic catalog](https://docs.xorq.dev/tutorials/core_tutorials/build_a_semantic_catalog.html): Use the Boring Semantic Layer (BSL) to define a flights model, query it, and catalog it as a recoverable artifact - [Working with the catalog](https://docs.xorq.dev/tutorials/core_tutorials/working_with_the_catalog.html): Share a catalog to GitHub, accept changes from a collaborator via pull request, and swap profiles when recovering entries - [Split data for training](https://docs.xorq.dev/tutorials/ml_tutorials/split_data_for_training.html): Create deterministic train, test, and validation splits for ML workflows - [Train your first model](https://docs.xorq.dev/tutorials/ml_tutorials/train_your_first_model.html): Build a classifier with the Iris dataset using Xorq's ML workflow - [Compare model performance](https://docs.xorq.dev/tutorials/ml_tutorials/explore_model_evaluation.html): Compare multiple models to find the best one for your classification task #### Concepts - [What is Xorq?](https://docs.xorq.dev/concepts/understanding_xorq/what_is_xorq.html): Understand what Xorq is, why it exists, and how it fits into your ML infrastructure - [Deferred Execution](https://docs.xorq.dev/concepts/understanding_xorq/why_deferred_execution.html): Deferred execution is the architectural choice that everything else in Xorq builds on - [Content-addressed artifacts](https://docs.xorq.dev/concepts/understanding_xorq/why_content_addressed_artifacts.html): Most systems name things by *where* or *when*: a path, a timestamp, a version number someone bumped by hand - [The catalog as executable memory](https://docs.xorq.dev/concepts/understanding_xorq/catalog_as_executable_memory.html): Most "memory" an agent accumulates is prose: Markdown notes, a `MEMORY.md` index, chat history - [Expression types](https://docs.xorq.dev/concepts/understanding_xorq/expression_types.html): Every operation in Xorq produces an expression --- a lazy, immutable description of a computation - [Multi-Engine](https://docs.xorq.dev/concepts/understanding_xorq/multi_engine_execution.html): Move data between different engines within a single expression using `into_backend()` - [Caching](https://docs.xorq.dev/concepts/understanding_xorq/intelligent_caching_system.html): Xorq's caching system stores intermediate results so iterative ML pipelines don't recompute work they've already done - [Pins: versioned data artifacts](https://docs.xorq.dev/concepts/core_concepts/pins.html): Xorq uses pins to provide named, versioned access to shared datasets, trained models, and code modules #### Overview - [Overview](https://docs.xorq.dev/api_reference/backends/index.html): Xorq supports multiple execution engines (backends) for running your data and ML workloads - [Profiles](https://docs.xorq.dev/api_reference/backends/profiles_api.html): The Profiles API provides a secure way to manage database connection parameters through environment variable references, allowing you to create, save, load, and use database connections while keeping sensitive information protected - [Environment variables](https://docs.xorq.dev/api_reference/backends/env_variables.html): This document provides a comprehensive reference for all environment variables used in the project's `.template` files - [Supported backends](https://docs.xorq.dev/api_reference/backends/supported_backends.html): Xorq currently supports: - [Expression Format](https://docs.xorq.dev/api_reference/expression_format.html): Xorq uses a YAML-based serialization format for storing expression artifacts - [User-Defined Exchange Functions](https://docs.xorq.dev/api_reference/user_defined_exchange_functions.html): Understanding the concept, and applications of User-Defined Exchange Functions in Xorq - [Cache API overview](https://docs.xorq.dev/api_reference/cache_api_overview.html): Strategy/storage matrix, how to build a cache, and backend invalidation signals for the Xorq cache classes