Connect to a backend

Create a connection to the embedded backend, DuckDB, PostgreSQL, Snowflake, Trino, or SQLite

This guide shows you how to create a connection for each backend Xorq supports. Every connection object exposes the same expression API, so the code you write after connecting doesn’t change between engines.

For picking which backend a pipeline step should run on, see Route a step to a specific backend.

Prerequisites

  • Xorq installed (Install Xorq)
  • The extra for your backend, for example pip install "xorq[postgres]"
  • Credentials, for remote backends like PostgreSQL or Snowflake

Steps

1. Create the connection

The default backend. A modified DataFusion engine, included in the base package; it needs no setup.

import xorq.api as xo

con = xo.connect()
print(f"Connected to: {con}")
Connected to: <xorq.backends.xorq_datafusion.Backend object at 0x7faffbbf7a40>

In-memory by default:

import xorq.api as xo

duck_con = xo.duckdb.connect()

For a database that survives between sessions, pass a path:

duck_con = xo.duckdb.connect(database="my_database.duckdb")

Set these environment variables, then connect with connect_env:

  • POSTGRES_HOST
  • POSTGRES_PORT
  • POSTGRES_DATABASE
  • POSTGRES_USER
  • POSTGRES_PASSWORD
import xorq.api as xo

pg_con = xo.postgres.connect_env()

You can also pass credentials directly:

pg_con = xo.postgres.connect(
    host="localhost",
    port=5432,
    database="your_database",
    user="your_user",
    password="your_password",
)
Warning

Don’t hardcode credentials in production code. Use environment variables or the Profiles API, which stores environment-variable references instead of values.

import xorq.api as xo

snow_con = xo.snowflake.connect(
    user="your_user",
    password="your_password",
    account="your_account",
    role="your_role",
    warehouse="your_warehouse",
    database="your_database",
    schema="your_schema",
)
import xorq.api as xo

trino_con = xo.trino.connect(
    host="localhost",
    port=8080,
    user="your_user",
    database="your_catalog",
    schema="your_schema",
)
import tempfile
from pathlib import Path

import xorq.api as xo

db_path = Path(tempfile.mkdtemp()) / "my_data.db"
sqlite_con = xo.sqlite.connect(database=str(db_path))

connect creates the database file if it doesn’t exist. Point database at any path you like; this example uses a temporary one.

Useful for small local datasets and tests:

import pandas as pd
import xorq.api as xo

pandas_con = xo.pandas.connect()

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
table = pandas_con.create_table("my_table", df)

2. Verify the connection

Whichever backend you connected to, give its variable a common name so the rest of this guide can refer to it. The examples below use the embedded backend, since it needs no setup; swap in duck_con, pg_con, or any other connection you created.

import xorq.api as xo

con = xo.connect()

List the tables the backend can see. An empty list is fine: the connection works, there just aren’t any tables yet.

print(con.list_tables())
[]

For a deeper check, run a small expression end to end:

penguins = xo.examples.penguins.fetch(backend=con)
print(penguins.count().execute())
344

3. Reuse the connection across your pipeline

A connection is the entry point for everything that follows: reading tables, registering data, and receiving data from other backends. Create it once and pass it around.

expr = penguins.filter(xo._.body_mass_g > 4000)
print(expr.ls.backends)
(<xorq.backends.xorq_datafusion.Backend object at 0x7fafeefb2db0>,)

The .ls accessor tells you which backends an expression touches, which becomes useful once a pipeline spans more than one engine.

See also