import xorq.api as xo
con = xo.connect()
print(f"Connected to: {con}")- 1
-
The
connectfunction creates a connection to the embedded backend.
This guide shows you how to install Xorq and configure it for your environment.
You’ll need Python 3.10 or higher to run Xorq. To check your current version, run this command:
If the version is below 3.10, download the latest release from the Python downloads page.
Install Xorq using pip. This gives you the core library, an embedded DataFusion backend, and Pandas support.
This option includes example datasets and ML libraries like scikit-learn, XGBoost, and the OpenAI SDK.
Install the latest development version directly from GitHub.
For local development, clone the repository and install in editable mode:
Xorq uses uv internally for dependency management. You can use it to install Xorq as well.
For project-based installation with locked dependencies:
Xorq runs on multiple execution engines. Choose the backends that match your infrastructure.
If you’re just getting started, install all backends. This lets you experiment with different engines and find what works best for your use case.
DuckDB works well for analytical workloads on local or moderate-sized datasets. It excels at AsOf joins and working with Parquet files.
Snowflake works well for cloud data warehouse operations with managed infrastructure and scalability.
PostgreSQL works well for production workloads with existing PostgreSQL databases.
PyIceberg works well for working with Apache Iceberg tables in data lakes.
DataFusion works well for in-memory analytical processing and custom UDFs. Note that an embedded DataFusion backend is included in the base installation.
SQLite works well for lightweight, serverless databases and local development.
Trino works well for distributed queries, data federation, and enterprise security. Install the Trino Python client separately.
Once you’ve installed Xorq, you’ll need to connect to a backend before you can work with data.
The embedded backend is the default option. It uses a modified DataFusion engine that’s optimized for Arrow UDF execution.
Pandas works well for local development and small datasets. Here’s how to create a connection and load data into it.
PostgreSQL connections require database credentials. You can provide them directly or load them from environment variables.
Using environment variables:
Set these environment variables before running your code:
POSTGRES_HOSTPOSTGRES_PORTPOSTGRES_DATABASEPOSTGRES_USERPOSTGRES_PASSWORDThen connect using connect_env:
connect_env function reads credentials from environment variables.
Using direct credentials:
You can also provide credentials directly in your code:
Don’t hardcode credentials in production code. Use environment variables or a secrets management system instead.
DuckDB connections can be in-memory or persistent. Here’s how both options work.
In-memory database:
Persistent database:
For a persistent database, provide the file path:
Use persistent databases when you want your data to survive between sessions. In-memory databases are faster but lose data when your program exits.
Snowflake connections require your account credentials and resource identifiers.
Trino connections let you query federated data sources across your infrastructure.
Let’s verify your installation by running a simple query. This example loads the iris dataset and filters it.
This query uses the embedded backend, so you don’t need any additional setup. It’s a good way to confirm everything is working before connecting to external databases.
If everything works correctly, you’ll see output showing the aggregated sepal widths grouped by species. The result is a PyArrow Table with two columns: species (Versicolor , Setosa , Virginica) and the summed sepal widths for each species. This confirms Xorq can load data, apply transformations, and execute queries on your system.
Now you’ve learned how to install Xorq and connect to a backend, you can explore these other resources: