Install Xorq

Set up Xorq in your environment with Python 3.10 or higher

This guide shows you how to install Xorq and configure it for your environment.

You’ll need Python 3.10 or higher to run Xorq. To check your current version, run this command:

python --version

Tip

If the version is below 3.10, download the latest release from the Python downloads page.

Install Xorq

Install Xorq using pip. This gives you the core library, an embedded DataFusion backend, and Pandas support.

pip install xorq

This option includes example datasets and ML libraries like scikit-learn, XGBoost, and the OpenAI SDK.

pip install "xorq[examples]"

Install the latest development version directly from GitHub.

pip install git+https://github.com/xorq-labs/xorq.git

For local development, clone the repository and install in editable mode:

git clone https://github.com/xorq-labs/xorq.git
cd xorq
pip install -e ".[examples]"

Xorq uses uv internally for dependency management. You can use it to install Xorq as well.

# Install uv first if you don't have it
pip install uv

# Install Xorq with uv
uv pip install xorq

For project-based installation with locked dependencies:

uv init my-xorq-project
cd my-xorq-project
uv add xorq

nix run github:xorq-labs/xorq

Install backend support

Xorq runs on multiple execution engines. Choose the backends that match your infrastructure.

Tip

If you’re just getting started, install all backends. This lets you experiment with different engines and find what works best for your use case.

DuckDB works well for analytical workloads on local or moderate-sized datasets. It excels at AsOf joins and working with Parquet files.

pip install "xorq[duckdb]"

Snowflake works well for cloud data warehouse operations with managed infrastructure and scalability.

pip install "xorq[snowflake]"

PostgreSQL works well for production workloads with existing PostgreSQL databases.

pip install "xorq[postgres]"

PyIceberg works well for working with Apache Iceberg tables in data lakes.

pip install "xorq[pyiceberg]"

DataFusion works well for in-memory analytical processing and custom UDFs. Note that an embedded DataFusion backend is included in the base installation.

pip install "xorq[datafusion]"

SQLite works well for lightweight, serverless databases and local development.

pip install "xorq[sqlite]"

Trino works well for distributed queries, data federation, and enterprise security. Install the Trino Python client separately.

pip install trino

This installs support for all backends at once. It’s the fastest way to explore what Xorq can do.

pip install "xorq[examples,duckdb,snowflake,postgres,pyiceberg,datafusion,sqlite]"

Connect to backends

Once you’ve installed Xorq, you’ll need to connect to a backend before you can work with data.

The embedded backend is the default option. It uses a modified DataFusion engine that’s optimized for Arrow UDF execution.

import xorq.api as xo


con = xo.connect()
print(f"Connected to: {con}")

1: The connect function creates a connection to the embedded backend.

Pandas works well for local development and small datasets. Here’s how to create a connection and load data into it.

import xorq.api as xo
import pandas as pd


pandas_con = xo.pandas.connect()


df = pd.DataFrame({
    "a": [1, 2, 3, 4, 5],
    "b": [2, 3, 4, 5, 6]
})


table = pandas_con.create_table("my_table", df)

1: Connect to the Pandas backend.
2: Create a Pandas DataFrame with sample data.
3: Load the DataFrame into a table in the Pandas backend.

PostgreSQL connections require database credentials. You can provide them directly or load them from environment variables.

Using environment variables:

Set these environment variables before running your code:

POSTGRES_HOST
POSTGRES_PORT
POSTGRES_DATABASE
POSTGRES_USER
POSTGRES_PASSWORD

Then connect using connect_env:

import xorq.api as xo


pg_con = xo.postgres.connect_env()


batting_table = pg_con.table("batting")

1: The connect_env function reads credentials from environment variables.
2: Access an existing table in your PostgreSQL database.

Using direct credentials:

You can also provide credentials directly in your code:

import xorq.api as xo

pg_con = xo.postgres.connect(
    host="localhost",
    port=5432,
    database="your_database",
    user="your_user",
    password="your_password"
)

batting_table = pg_con.table("batting")

1: Replace these values with your actual PostgreSQL connection details.

Warning

Don’t hardcode credentials in production code. Use environment variables or a secrets management system instead.

DuckDB connections can be in-memory or persistent. Here’s how both options work.

In-memory database:

import xorq.api as xo


duck_con = xo.duckdb.connect()

1: This creates an in-memory DuckDB database.

Persistent database:

For a persistent database, provide the file path:

import xorq.api as xo


duck_con = xo.duckdb.connect(database="my_database.duckdb")

1: This creates or opens a DuckDB database file.

Tip

Use persistent databases when you want your data to survive between sessions. In-memory databases are faster but lose data when your program exits.

Snowflake connections require your account credentials and resource identifiers.

import xorq.api as xo

snow_con = xo.snowflake.connect(
    user="your_user",
    password="your_password",
    account="your_account",
    role="your_role",
    warehouse="your_warehouse",
    database="your_database",
    schema="your_schema"
)

1: Replace these values with your actual Snowflake credentials.

Trino connections let you query federated data sources across your infrastructure.

import xorq.api as xo

trino_con = xo.trino.connect(
    host="localhost",
    port=8080,
    user="your_user",
    database="your_catalog",
    schema="your_schema"
)

1: Replace these values with your Trino server details.

SQLite works well for lightweight databases and local development.

import xorq.api as xo


sqlite_con = xo.sqlite.connect(database="my_data.db")

1: This creates or opens an SQLite database file.

Run your first query

Let’s verify your installation by running a simple query. This example loads the iris dataset and filters it.

Note

This query uses the embedded backend, so you don’t need any additional setup. It’s a good way to confirm everything is working before connecting to external databases.

import xorq.api as xo


con = xo.connect()


iris = xo.examples.iris.fetch(backend=con)


filtered = iris.filter(xo._.sepal_length > 5)


grouped = filtered.group_by("species").agg(
    xo._.sepal_width.sum()
)


result = grouped.execute()
print(result)

1: Connect to the embedded backend.
2: Load the iris dataset into your backend.
3: Filter rows where sepal length is greater than 5.
4: Group by species and sum the sepal widths.
5: Execute the query and print results.

If everything works correctly, you’ll see output showing the aggregated sepal widths grouped by species. The result is a PyArrow Table with two columns: species (Versicolor , Setosa , Virginica) and the summed sepal widths for each species. This confirms Xorq can load data, apply transformations, and execute queries on your system.

Next steps

Now you’ve learned how to install Xorq and connect to a backend, you can explore these other resources:

Your first Xorq expression shows you how to execute a basic Xorq expression
Explore caching teaches you about cache hits and misses, plus how invalidation works
Train your first model walks you through training a basic classification model with Xorq