Multi-Engine

The core concepts to understand multi-engine system

Multi-Engine

xorq’s multi-engine system enables seamless data movement between different query engines, allowing you to leverage the strengths of each engine while maintaining a unified workflow.

The `into_backend` Operator

The core of xorq’s multi-engine capability is the into_backend operator, which enables:

Transparent data movement between engines
Zero-copy data transfer using Apache Arrow
Automatic optimization of data placement

import xorq as xo
from xorq.expr.relations import into_backend

# Connect to different engines
pg = xo.postgres.connect_env()
db = xo.duckdb.connect()

# Get tables from different sources
batting = pg.table("batting")

# Load awards_players into DuckDB
awards_players = xo.examples.awards_players.fetch(backend=db)

# Filter data in respective engines
left = batting.filter(batting.yearID == 2015)
right = awards_players.filter(awards_players.lgID == "NL").drop("yearID", "lgID")

# Move right table into postgres for efficient join
expr = left.join(
    into_backend(right, pg),
    ["playerID"],
    how="semi"
)[["yearID", "stint"]]

# Execute the multi-engine query
result = expr.execute()

Invalid type NoneType for attribute 'path' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types

Supported Engines

xorq currently supports:

In-Process Engines
- DuckDB
- DataFusion
- Pandas
Distributed Engines
- Trino
- Snowflake
- BigQuery

Engine Selection Guidelines

Choose engines based on their strengths:

DuckDB: Local processing, AsOf joins, efficient file formats
DataFusion: Custom UDFs, streaming processing
Trino: Distributed queries, federation, security
Snowflake/BigQuery: Managed infrastructure, scalability

Data Transfer

Data movement between engines is handled through:

Arrow Flight: Zero-copy data transfer protocol
Memory Management: Automatic spilling to disk
Batching: Efficient chunk-based processing

Multi-Engine

The into_backend Operator

Supported Engines

Engine Selection Guidelines

Data Transfer

The `into_backend` Operator