Welcome to Xorq!

Xorq is an opinionated framework for cataloging composable compute expressions that enables you to build portable, multi-engine ML pipelines with deferred execution. Write expressive, pandas-style transformations that seamlessly move between SQL engines and Python, with built-in caching, lineage tracking, and deployment-ready artifacts. Xorq is built on top of Ibis and Apache DataFusion.

Getting Started

10-minute tour of Xorq Learn key concepts in a brief tutorial

Quickstart Get a first taste of xorq

Dive Deeper

Multipart series on how to build an end-to-end ML pipeline using live data from the HackerNews API.

Data Labeling w/ LLMs Learn to use UDXFs for preparing and labeling data with OpenAI.

Transform with TF-IDF Learn how to build fit-transform style deferred pipelines.

XGBoost Model Serving Learn how to serve fit-transform trained models

XGBoost Training Learn how to build fit-predict style deferred pipelines.

Why Xorq?

xorq was developed to solve the frustrating complexities of building reliable ML pipelines across multiple engines and environments. Traditional approaches force you to choose between the expressiveness of pandas and the scalability of SQL engines, leading to SQL-pandas impedance mismatches, wasteful recomputation, and pipelines that work in notebooks but fail in production. The xorq computational framework provides a quantum leap in ML development by:

Unifying multi-engine workflows - seamlessly combine Snowflake, DuckDB, and Python within a single declarative pipeline, eliminating engine-specific rewrites.
Enabling true portability - write UDFs once and run them consistently across any supported engine, with automatic serialization to diff-able YAML artifacts for reproducibility.
Accelerating iteration - intelligent caching of intermediate results means no more waiting for expensive joins or full pipeline re-runs after every change.
Making deployment seamless - moving a working pipeline from local development to production requires no rewriting, with built-in compile-time validation and lineage tracking.
Providing observability - automatic column-level lineage tracking and fail-fast pipelines give you the visibility and confidence needed for production ML systems.