Table

Table(arg)

An immutable and lazy dataframe.

Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns, each with a single known type. Unless explicitly ordered with an .order_by(), the order of rows is undefined.

Table immutability means that the data underlying an Ibis Table cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.

You will not create Table objects directly. Instead, you will create one

from a pandas DataFrame, pyarrow table, Polars table, or raw python dicts/lists with xorq.memtable(df)
from an existing table in a data platform with connection.table("name")
from a file or URL, into a specific backend with connection.read_csv/parquet/json("path/to/file") (only some backends, typically local ones, support this)
from a file or URL, into the default backend with ibis.read_csv/read_json/read_parquet("path/to/file")

Attributes

Name	Description
columns	The list of column names in this table.

Methods

Name	Description
aggregate	Aggregate a table with a given set of reductions grouping by `by`.
alias	Create a table expression with a specific name `alias`.
as_scalar	Inform ibis that the table expression should be treated as a scalar.
as_table	Promote the expression to a table.
asof_join	Perform an “as-of” join between `left` and `right`.
bind	Bind column values to a table expression.
cache	Cache the results of a computation to improve performance on subsequent executions.
cast	Cast the columns of a table.
compile	Compile to an execution target.
count	Compute the number of rows in the table.
cross_join	Compute the cross join of a sequence of tables.
describe	Return summary information about a table.
difference	Compute the set difference of multiple table expressions.
distinct	Return a Table with duplicate rows removed.
drop	Remove fields from a table.
drop_null	Remove rows with null values from the table.
dropna	Deprecated - use `drop_null` instead.
equals	Return whether this expression is structurally equivalent to `other`.
execute	Execute an expression against its backend if one exists.
fill_null	Fill null values in a table expression.
fillna	Deprecated - use `fill_null` instead.
filter	Select rows from `table` based on `predicates`.
get_name	Return the fully qualified name of the table.
group_by	Create a grouped table expression.
has_name	Check whether this expression has an explicit name.
head	Select the first `n` rows of a table.
info	Return summary information about a table.
intersect	Compute the set intersection of multiple table expressions.
into_backend	Converts the Expr to a table in the given backend `con` with an optional table name `name`.
join	Perform a join between two tables.
limit	Select `n` rows from `self` starting at `offset`.
mutate	Add columns to a table expression.
nunique	Compute the number of unique rows in the table.
order_by	Sort a table by one or more expressions.
pipe	Compose `f` with `self`.
pivot_longer	Transform a table from wider to longer.
pivot_wider	Pivot a table to a wider format.
preview	Return a subset as a Rich Table.
relabel	Deprecated in favor of `Table.rename`.
relocate	Relocate `columns` before or after other specified columns.
rename	Rename columns in the table.
rowid	A unique integer per row.
sample	Sample a fraction of rows from a table.
schema	Return the Schema for this table.
select	Compute a new table expression using `exprs` and `named_exprs`.
sql	Run a SQL query against a table expression.
to_array	View a single column table as an array.
to_csv	Write the results of executing the given expression to a CSV file.
to_json	Write the results of `expr` to a NDJSON file.
to_pandas	Convert a table expression to a pandas DataFrame.
to_parquet	Write the results of executing the given expression to a parquet file.
to_pyarrow	Execute expression and return results in as a pyarrow table.
to_pyarrow_batches	Execute expression and return a RecordBatchReader.
try_cast	Cast the columns of a table.
unbind	Return an expression built on `UnboundTable` instead of backend-specific objects.
union	Compute the set union of multiple table expressions.
unnest	Unnest an array `column` from a table.
unpack	Project the struct fields of each of `columns` into `self`.
value_counts	Compute a frequency table of this table’s values.
view	Create a new table expression distinct from the current one.
visualize	Visualize an expression as a GraphViz graph in the browser.

aggregate

aggregate(metrics=(), by=(), having=(), **kwargs)

Aggregate a table with a given set of reductions grouping by by.

Parameters

Name	Type	Description	Default
metrics	Sequence[ir.Scalar] \| None	Aggregate expressions. These can be any scalar-producing expression, including aggregation functions like `sum` or literal values like `ibis.literal(1)`.	`()`
by	Sequence[ir.Value] \| None	Grouping expressions.	`()`
having	Sequence[ir.BooleanValue] \| None	Post-aggregation filters. The shape requirements are the same `metrics`, but the output type for `having` is `boolean`. ::: {.callout-warning} ## Expressions like `x is None` return `bool` and will not generate a SQL comparison to `NULL` :::	`()`
kwargs	ir.Value	Named aggregate expressions	`{}`

Returns

Name	Type	Description
	Table	An aggregate table expression

Examples

>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.memtable(
...     {
...         "fruit": ["apple", "apple", "banana", "orange"],
...         "price": [0.5, 0.5, 0.25, 0.33],
...     }
... )
>>> t

┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit  ┃ price   ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────┤
│ apple  │    0.50 │
│ apple  │    0.50 │
│ banana │    0.25 │
│ orange │    0.33 │
└────────┴─────────┘

>>> t.aggregate(
...     by=["fruit"],
...     total_cost=_.price.sum(),
...     avg_cost=_.price.mean(),
...     having=_.price.sum() < 0.5,
... )

┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit  ┃ total_cost ┃ avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64    │ float64  │
├────────┼────────────┼──────────┤
│ banana │       0.25 │     0.25 │
│ orange │       0.33 │     0.33 │
└────────┴────────────┴──────────┘

alias

alias(alias)

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias will create a temporary view

.alias creates a temporary view in the database.

This side effect will be removed in a future version of ibis and is not part of the public API.

Parameters

Name	Type	Description	Default
alias	str	Name of the child expression	required

Returns

Name	Type	Description
	Table	An table expression

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> expr = t.alias("pingüinos").sql('SELECT * FROM "pingüinos" LIMIT 5')
>>> expr # quartodoc: +SKIP

as_scalar

as_scalar()

Inform ibis that the table expression should be treated as a scalar.

Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.

Returns

Name	Type	Description
	Scalar	A scalar subquery

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.select("island").as_scalar())
>>> from_that_island.species.value_counts().order_by("species")

┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string  │ int64         │
├─────────┼───────────────┤
│ Adelie  │            44 │
│ Gentoo  │           124 │
└─────────┴───────────────┘

as_table

as_table()

Promote the expression to a table.

This method is a no-op for table expressions.

Returns

Name	Type	Description
	Table	A table expression

Examples

>>> import xorq.api as xo
>>> t = xo.table(dict(a="int"), name="t")
>>> s = t.as_table()
>>> t is s

True

asof_join

asof_join(
    left,
    right,
    on,
    predicates=(),
    tolerance=None,
    *,
    lname='',
    rname='{name}_right',
)

Perform an “as-of” join between left and right.

Similar to a left join except that the match is done on nearest key rather than equal keys.

Parameters

Name	Type	Description	Default
left	Table	Table expression	required
right	Table	Table expression	required
on	str \| ir.BooleanColumn	Closest match inequality condition	required
predicates	str \| ir.Column \| Sequence[str \| ir.Column]	Additional join predicates	`()`
tolerance	str \| ir.IntervalScalar \| None	Amount of time to look behind when joining	`None`
lname	str	A format string to use to rename overlapping columns in the left table (e.g. `"left_{name}"`).	`''`
rname	str	A format string to use to rename overlapping columns in the right table (e.g. `"right_{name}"`).	`'{name}_right'`

Returns

Name	Type	Description
	Table	Table expression

bind

bind(*args, **kwargs)

Bind column values to a table expression.

This method handles the binding of every kind of column-like value that Ibis handles, including strings, integers, deferred expressions and selectors, to a table expression.

Parameters

Name	Type	Description	Default
args	Any	Column-like values to bind.	`()`
kwargs	Any	Column-like values to bind, with names.	`{}`

Returns

Name	Type	Description
	tuple[Value, …]	A tuple of bound values

cache

cache(storage=None)

Cache the results of a computation to improve performance on subsequent executions. This method allows you to cache the results of a computation either in memory, on disk using Parquet files, or in a database table. The caching strategy and storage location are determined by the storage parameter.

Parameters

Name	Type	Description	Default
storage	CacheStorage	The storage strategy to use for caching. Can be one of: - ParquetStorage: Caches results as Parquet files on disk - SourceStorage: Caches results in the source database - ParquetSnapshotStorage: Creates a snapshot of data in Parquet format - SourceSnapshotStorage: Creates a snapshot in the source database If None, uses the default storage configuration.	`None`

Returns

Name	Type	Description
	Expr	A new expression that represents the cached computation.

Notes

The cache method supports two main strategies: 1. ModificationTimeStrategy: Tracks changes based on modification time 2. SnapshotStrategy: Creates point-in-time snapshots of the data

Each strategy can be combined with either Parquet or database storage.

Examples

Using ParquetStorage:

>>> import xorq.api as xo
>>> from xorq.caching import ParquetStorage
>>> from pathlib import Path
>>> pg = xo.postgres.connect_examples()
>>> con = xo.connect()
>>> storage = ParquetStorage(source=con, relative_path=Path.cwd())
>>> alltypes = pg.table("functional_alltypes")
>>> cached = (alltypes
...     .select(alltypes.smallint_col, alltypes.int_col, alltypes.float_col)
...     .cache(storage=storage))

Using SourceStorage with PostgreSQL:

>>> from xorq.caching import SourceStorage
>>> from xorq.api import _
>>> ddb = xo.duckdb.connect()
>>> path = xo.config.options.pins.get_path("batting")
>>> right = (ddb.read_parquet(path, table_name="batting")
...          .filter(_.yearID == 2014)
...          .pipe(con.register, table_name="ddb-batting"))
>>> left = (pg.table("batting")
...         .filter(_.yearID == 2015)
...         .pipe(con.register, table_name="pg-batting"))
>>> # Cache the joined result
>>> expr = left.join(right, "playerID").cache(SourceStorage(source=pg))

Using cache with filtering:

>>> cached = alltypes.cache(storage=storage)
>>> expr = cached.filter([
...     cached.float_col > 0,
...     cached.smallint_col > 4,
...     cached.int_col < cached.float_col * 2
... ])

Notes

The cache is identified by a unique key based on the computation and strategy
Cache invalidation is handled automatically based on the chosen strategy
Cross-source caching (e.g., from PostgreSQL to DuckDB) is supported
Cache locations can be configured globally through xorq.config.options

cast

cast(schema)

Cast the columns of a table.

Similar to pandas.DataFrame.astype.

If you need to cast columns to a single type, use selectors.

Parameters

Name	Type	Description	Default
schema	SchemaLike	Mapping, schema or iterable of pairs to use for casting	required

Returns

Name	Type	Description
	Table	Cast table

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.schema()

ibis.Schema {
  species            string
  island             string
  bill_length_mm     float64
  bill_depth_mm      float64
  flipper_length_mm  float64
  body_mass_g        float64
  sex                string
  year               int64
}

>>> cols = ["body_mass_g", "bill_length_mm"]
>>> t[cols].head()

┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g ┃ bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ float64     │ float64        │
├─────────────┼────────────────┤
│      3750.0 │           39.1 │
│      3800.0 │           39.5 │
│      3250.0 │           40.3 │
│        NULL │           NULL │
│      3450.0 │           36.7 │
└─────────────┴────────────────┘

Columns not present in the input schema will be passed through unchanged

>>> t.columns

['species',
 'island',
 'bill_length_mm',
 'bill_depth_mm',
 'flipper_length_mm',
 'body_mass_g',
 'sex',
 'year']

>>> expr = t.cast({"body_mass_g": "float64", "bill_length_mm": "int"})
>>> expr.select(*cols).head()

┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g ┃ bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ float64     │ int64          │
├─────────────┼────────────────┤
│      3750.0 │             39 │
│      3800.0 │             39 │
│      3250.0 │             40 │
│        NULL │           NULL │
│      3450.0 │             36 │
└─────────────┴────────────────┘

Columns that are in the input schema but not in the table raise an error

>>> t.cast({"foo": "string"})

---------------------------------------------------------------------------
XorqError                                 Traceback (most recent call last)
Cell In[35], line 1
----> 1 t.cast({"foo": "string"})  

File ~/work/xorq/xorq/python/xorq/vendor/ibis/expr/types/relations.py:442, in Table.cast(self, schema)
    368 def cast(self, schema: SchemaLike) -> Table:
    369     """Cast the columns of a table.
    370 
    371     Similar to `pandas.DataFrame.astype`.
   (...)    440     xorq.vendor.ibis.common.exceptions.XorqError: Cast schema has fields that are not in the table: ['foo']
    441     """
--> 442     return self._cast(schema, cast_method="cast")

File ~/work/xorq/xorq/python/xorq/vendor/ibis/expr/types/relations.py:472, in Table._cast(self, schema, cast_method)
    470 columns = self.columns
    471 if missing_fields := frozenset(schema.names).difference(columns):
--> 472     raise com.XorqError(
    473         f"Cast schema has fields that are not in the table: {sorted(missing_fields)}"
    474     )
    476 for col in columns:
    477     if (new_type := schema.get(col)) is not None:

XorqError: Cast schema has fields that are not in the table: ['foo']

compile

compile(limit=None, params=None, pretty=False)

Compile to an execution target.

Parameters

Name	Type	Description	Default
limit	int \| None	An integer to effect a specific row limit. A value of `None` means “no limit”. The default is in `ibis/config.py`.	`None`
params	Mapping[ir.Value, Any] \| None	Mapping of scalar parameter expressions to value	`None`
pretty	bool	In case of SQL backends, return a pretty formatted SQL query.	`False`

count

count(where=None)

Compute the number of rows in the table.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Optional boolean expression to filter rows when counting.	`None`

Returns

Name	Type	Description
	IntegerScalar	Number of rows in the table

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a": ["foo", "bar", "baz"]})
>>> t

┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ baz    │
└────────┘

>>> t.count()

┌───┐
│ 3 │
└───┘

>>> t.count(t.a != "foo")

┌───┐
│ 2 │
└───┘

>>> type(t.count())

xorq.vendor.ibis.expr.types.numeric.IntegerScalar

cross_join

cross_join(left, right, *rest, lname='', rname='{name}_right')

Compute the cross join of a sequence of tables.

Parameters

Name	Type	Description	Default
left	Table	Left table	required
right	Table	Right table	required
rest	Table	Additional tables to cross join	`()`
lname	str	A format string to use to rename overlapping columns in the left table (e.g. `"left_{name}"`).	`''`
rname	str	A format string to use to rename overlapping columns in the right table (e.g. `"right_{name}"`).	`'{name}_right'`

Returns

Name	Type	Description
	Table	Cross join of `left`, `right` and `rest`

Examples

>>> import xorq.api as xo
>>> import xorq.expr.selectors as s
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.count()

┌─────┐
│ 344 │
└─────┘

>>> agg = t.drop("year").agg(s.across(s.numeric(), _.mean()))
>>> expr = t.cross_join(agg)
>>> expr

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃ bill_length_mm_right ┃ bill_depth_mm_right ┃ flipper_length_mm_right ┃ body_mass_g_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │ float64              │ float64             │ float64                 │ float64           │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼──────────────────────┼─────────────────────┼─────────────────────────┼───────────────────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │             43.92193 │            17.15117 │              200.915205 │       4201.754386 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │                    … │                   … │                       … │                 … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴──────────────────────┴─────────────────────┴─────────────────────────┴───────────────────┘

>>> expr.columns

['species',
 'island',
 'bill_length_mm',
 'bill_depth_mm',
 'flipper_length_mm',
 'body_mass_g',
 'sex',
 'year',
 'bill_length_mm_right',
 'bill_depth_mm_right',
 'flipper_length_mm_right',
 'body_mass_g_right']

>>> expr.count()

┌─────┐
│ 344 │
└─────┘

describe

describe(quantile=(0.25, 0.5, 0.75))

Return summary information about a table.

Parameters

Name	Type	Description	Default
quantile	Sequence[ir.NumericValue \| float]	The quantiles to compute for numerical columns. Defaults to (0.25, 0.5, 0.75).	`(0.25, 0.5, 0.75)`

Returns

Name	Type	Description
	Table	A table containing summary information about the columns of self.

Notes

This function computes summary statistics for each column in the table. For numerical columns, it computes statistics such as minimum, maximum, mean, standard deviation, and quantiles. For string columns, it computes the mode and the number of unique values.

Examples

>>> import xorq.api as xo
>>> import xorq.expr.selectors as s
>>> xo.options.interactive = True
>>> p = xo.examples.penguins.fetch(deferred=False)
>>> p.describe()

Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'Mode' operation is not defined")
Expression repr follows:
r0 := DatabaseTable: penguins
  species           string
  island            string
  bill_length_mm    float64
  bill_depth_mm     float64
  flipper_length_mm float64
  body_mass_g       float64
  sex               string
  year              int64

r1 := Aggregate[r0]
  metrics:
    name:   'species'
    pos:    0
    type:   'string'
    count:  Count(IsNull(r0.species))
    nulls:  Sum(IsNull(r0.species))
    unique: CountDistinct(r0.species)
    mode:   Mode(r0.species)
    mean:   Cast(None, to=float64)
    std:    Cast(None, to=float64)
    min:    Cast(None, to=float64)
    p25:    Cast(None, to=float64)
    p50:    Cast(None, to=float64)
    p75:    Cast(None, to=float64)
    max:    Cast(None, to=float64)

r2 := Aggregate[r0]
  metrics:
    name:   'island'
    pos:    1
    type:   'string'
    count:  Count(IsNull(r0.island))
    nulls:  Sum(IsNull(r0.island))
    unique: CountDistinct(r0.island)
    mode:   Mode(r0.island)
    mean:   Cast(None, to=float64)
    std:    Cast(None, to=float64)
    min:    Cast(None, to=float64)
    p25:    Cast(None, to=float64)
    p50:    Cast(None, to=float64)
    p75:    Cast(None, to=float64)
    max:    Cast(None, to=float64)

r3 := Aggregate[r0]
  metrics:
    name:   'bill_length_mm'
    pos:    2
    type:   'float64'
    count:  Count(IsNull(r0.bill_length_mm))
    nulls:  Sum(IsNull(r0.bill_length_mm))
    unique: CountDistinct(r0.bill_length_mm)
    mode:   Cast(None, to=string)
    mean:   Mean(r0.bill_length_mm)
    std:    StandardDev(r0.bill_length_mm, how='sample')
    min:    Min(r0.bill_length_mm)
    p25:    Quantile(r0.bill_length_mm, quantile=0.25)
    p50:    Quantile(r0.bill_length_mm, quantile=0.5)
    p75:    Quantile(r0.bill_length_mm, quantile=0.75)
    max:    Max(r0.bill_length_mm)

r4 := Aggregate[r0]
  metrics:
    name:   'bill_depth_mm'
    pos:    3
    type:   'float64'
    count:  Count(IsNull(r0.bill_depth_mm))
    nulls:  Sum(IsNull(r0.bill_depth_mm))
    unique: CountDistinct(r0.bill_depth_mm)
    mode:   Cast(None, to=string)
    mean:   Mean(r0.bill_depth_mm)
    std:    StandardDev(r0.bill_depth_mm, how='sample')
    min:    Min(r0.bill_depth_mm)
    p25:    Quantile(r0.bill_depth_mm, quantile=0.25)
    p50:    Quantile(r0.bill_depth_mm, quantile=0.5)
    p75:    Quantile(r0.bill_depth_mm, quantile=0.75)
    max:    Max(r0.bill_depth_mm)

r5 := Aggregate[r0]
  metrics:
    name:   'flipper_length_mm'
    pos:    4
    type:   'float64'
    count:  Count(IsNull(r0.flipper_length_mm))
    nulls:  Sum(IsNull(r0.flipper_length_mm))
    unique: CountDistinct(r0.flipper_length_mm)
    mode:   Cast(None, to=string)
    mean:   Mean(r0.flipper_length_mm)
    std:    StandardDev(r0.flipper_length_mm, how='sample')
    min:    Min(r0.flipper_length_mm)
    p25:    Quantile(r0.flipper_length_mm, quantile=0.25)
    p50:    Quantile(r0.flipper_length_mm, quantile=0.5)
    p75:    Quantile(r0.flipper_length_mm, quantile=0.75)
    max:    Max(r0.flipper_length_mm)

r6 := Aggregate[r0]
  metrics:
    name:   'body_mass_g'
    pos:    5
    type:   'float64'
    count:  Count(IsNull(r0.body_mass_g))
    nulls:  Sum(IsNull(r0.body_mass_g))
    unique: CountDistinct(r0.body_mass_g)
    mode:   Cast(None, to=string)
    mean:   Mean(r0.body_mass_g)
    std:    StandardDev(r0.body_mass_g, how='sample')
    min:    Min(r0.body_mass_g)
    p25:    Quantile(r0.body_mass_g, quantile=0.25)
    p50:    Quantile(r0.body_mass_g, quantile=0.5)
    p75:    Quantile(r0.body_mass_g, quantile=0.75)
    max:    Max(r0.body_mass_g)

r7 := Aggregate[r0]
  metrics:
    name:   'sex'
    pos:    6
    type:   'string'
    count:  Count(IsNull(r0.sex))
    nulls:  Sum(IsNull(r0.sex))
    unique: CountDistinct(r0.sex)
    mode:   Mode(r0.sex)
    mean:   Cast(None, to=float64)
    std:    Cast(None, to=float64)
    min:    Cast(None, to=float64)
    p25:    Cast(None, to=float64)
    p50:    Cast(None, to=float64)
    p75:    Cast(None, to=float64)
    max:    Cast(None, to=float64)

r8 := Aggregate[r0]
  metrics:
    name:   'year'
    pos:    7
    type:   'int64'
    count:  Count(IsNull(r0.year))
    nulls:  Sum(IsNull(r0.year))
    unique: CountDistinct(r0.year)
    mode:   Cast(None, to=string)
    mean:   Mean(r0.year)
    std:    StandardDev(r0.year, how='sample')
    min:    Cast(Min(r0.year), to=float64)
    p25:    Quantile(r0.year, quantile=0.25)
    p50:    Quantile(r0.year, quantile=0.5)
    p75:    Quantile(r0.year, quantile=0.75)
    max:    Cast(Max(r0.year), to=float64)

r9 := Union[r1, r2, distinct=False]

r10 := Union[r3, r4, distinct=False]

r11 := Union[r5, r6, distinct=False]

r12 := Union[r7, r8, distinct=False]

r13 := Union[r9, r10, distinct=False]

r14 := Union[r11, r12, distinct=False]

Union[r13, r14, distinct=False]

>>> p.select(s.of_type("numeric")).describe()

Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'Quantile' operation is not defined")
Expression repr follows:
r0 := DatabaseTable: penguins
  species           string
  island            string
  bill_length_mm    float64
  bill_depth_mm     float64
  flipper_length_mm float64
  body_mass_g       float64
  sex               string
  year              int64

r1 := Project[r0]
  bill_length_mm:    r0.bill_length_mm
  bill_depth_mm:     r0.bill_depth_mm
  flipper_length_mm: r0.flipper_length_mm
  body_mass_g:       r0.body_mass_g
  year:              r0.year

r2 := Aggregate[r1]
  metrics:
    name:   'flipper_length_mm'
    pos:    2
    type:   'float64'
    count:  Count(IsNull(r1.flipper_length_mm))
    nulls:  Sum(IsNull(r1.flipper_length_mm))
    unique: CountDistinct(r1.flipper_length_mm)
    mode:   Cast(None, to=string)
    mean:   Mean(r1.flipper_length_mm)
    std:    StandardDev(r1.flipper_length_mm, how='sample')
    min:    Min(r1.flipper_length_mm)
    p25:    Quantile(r1.flipper_length_mm, quantile=0.25)
    p50:    Quantile(r1.flipper_length_mm, quantile=0.5)
    p75:    Quantile(r1.flipper_length_mm, quantile=0.75)
    max:    Max(r1.flipper_length_mm)

r3 := Aggregate[r1]
  metrics:
    name:   'body_mass_g'
    pos:    3
    type:   'float64'
    count:  Count(IsNull(r1.body_mass_g))
    nulls:  Sum(IsNull(r1.body_mass_g))
    unique: CountDistinct(r1.body_mass_g)
    mode:   Cast(None, to=string)
    mean:   Mean(r1.body_mass_g)
    std:    StandardDev(r1.body_mass_g, how='sample')
    min:    Min(r1.body_mass_g)
    p25:    Quantile(r1.body_mass_g, quantile=0.25)
    p50:    Quantile(r1.body_mass_g, quantile=0.5)
    p75:    Quantile(r1.body_mass_g, quantile=0.75)
    max:    Max(r1.body_mass_g)

r4 := Aggregate[r1]
  metrics:
    name:   'year'
    pos:    4
    type:   'int64'
    count:  Count(IsNull(r1.year))
    nulls:  Sum(IsNull(r1.year))
    unique: CountDistinct(r1.year)
    mode:   Cast(None, to=string)
    mean:   Mean(r1.year)
    std:    StandardDev(r1.year, how='sample')
    min:    Cast(Min(r1.year), to=float64)
    p25:    Quantile(r1.year, quantile=0.25)
    p50:    Quantile(r1.year, quantile=0.5)
    p75:    Quantile(r1.year, quantile=0.75)
    max:    Cast(Max(r1.year), to=float64)

r5 := Aggregate[r1]
  metrics:
    name:   'bill_length_mm'
    pos:    0
    type:   'float64'
    count:  Count(IsNull(r1.bill_length_mm))
    nulls:  Sum(IsNull(r1.bill_length_mm))
    unique: CountDistinct(r1.bill_length_mm)
    mode:   Cast(None, to=string)
    mean:   Mean(r1.bill_length_mm)
    std:    StandardDev(r1.bill_length_mm, how='sample')
    min:    Min(r1.bill_length_mm)
    p25:    Quantile(r1.bill_length_mm, quantile=0.25)
    p50:    Quantile(r1.bill_length_mm, quantile=0.5)
    p75:    Quantile(r1.bill_length_mm, quantile=0.75)
    max:    Max(r1.bill_length_mm)

r6 := Aggregate[r1]
  metrics:
    name:   'bill_depth_mm'
    pos:    1
    type:   'float64'
    count:  Count(IsNull(r1.bill_depth_mm))
    nulls:  Sum(IsNull(r1.bill_depth_mm))
    unique: CountDistinct(r1.bill_depth_mm)
    mode:   Cast(None, to=string)
    mean:   Mean(r1.bill_depth_mm)
    std:    StandardDev(r1.bill_depth_mm, how='sample')
    min:    Min(r1.bill_depth_mm)
    p25:    Quantile(r1.bill_depth_mm, quantile=0.25)
    p50:    Quantile(r1.bill_depth_mm, quantile=0.5)
    p75:    Quantile(r1.bill_depth_mm, quantile=0.75)
    max:    Max(r1.bill_depth_mm)

r7 := Union[r2, r3, distinct=False]

r8 := Union[r5, r6, distinct=False]

r9 := Union[r4, r8, distinct=False]

r10 := Union[r7, r9, distinct=False]

DropColumns[r10]
  columns_to_drop:
    frozenset({'mode'})
  schema:
    name   string
    pos    int16
    type   string
    count  int64
    nulls  int64
    unique int64
    mean   float64
    std    float64
    min    float64
    p25    float64
    p50    float64
    p75    float64
    max    float64

>>> p.select(s.of_type("string")).describe()

Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'Mode' operation is not defined")
Expression repr follows:
r0 := DatabaseTable: penguins
  species           string
  island            string
  bill_length_mm    float64
  bill_depth_mm     float64
  flipper_length_mm float64
  body_mass_g       float64
  sex               string
  year              int64

r1 := Project[r0]
  species: r0.species
  island:  r0.island
  sex:     r0.sex

r2 := Aggregate[r1]
  metrics:
    name:   'sex'
    pos:    2
    type:   'string'
    count:  Count(IsNull(r1.sex))
    nulls:  Sum(IsNull(r1.sex))
    unique: CountDistinct(r1.sex)
    mode:   Mode(r1.sex)
    mean:   Cast(None, to=float64)
    std:    Cast(None, to=float64)
    min:    Cast(None, to=float64)
    p25:    Cast(None, to=float64)
    p50:    Cast(None, to=float64)
    p75:    Cast(None, to=float64)
    max:    Cast(None, to=float64)

r3 := Aggregate[r1]
  metrics:
    name:   'species'
    pos:    0
    type:   'string'
    count:  Count(IsNull(r1.species))
    nulls:  Sum(IsNull(r1.species))
    unique: CountDistinct(r1.species)
    mode:   Mode(r1.species)
    mean:   Cast(None, to=float64)
    std:    Cast(None, to=float64)
    min:    Cast(None, to=float64)
    p25:    Cast(None, to=float64)
    p50:    Cast(None, to=float64)
    p75:    Cast(None, to=float64)
    max:    Cast(None, to=float64)

r4 := Aggregate[r1]
  metrics:
    name:   'island'
    pos:    1
    type:   'string'
    count:  Count(IsNull(r1.island))
    nulls:  Sum(IsNull(r1.island))
    unique: CountDistinct(r1.island)
    mode:   Mode(r1.island)
    mean:   Cast(None, to=float64)
    std:    Cast(None, to=float64)
    min:    Cast(None, to=float64)
    p25:    Cast(None, to=float64)
    p50:    Cast(None, to=float64)
    p75:    Cast(None, to=float64)
    max:    Cast(None, to=float64)

r5 := Union[r3, r4, distinct=False]

r6 := Union[r2, r5, distinct=False]

Project[r6]
  name:   r6.name
  pos:    r6.pos
  type:   r6.type
  count:  r6.count
  nulls:  r6.nulls
  unique: r6.unique
  mode:   r6.mode

difference

difference(table, *rest, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name	Type	Description	Default
table	Table	A table expression	required
*rest	Table	Additional table expressions	`()`
distinct	bool	Only diff distinct rows not occurring in the calling table	`True`

Returns

Name	Type	Description
	Table	The rows present in `self` that are not present in `tables`.

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a": [1, 2]})
>>> t1

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘

>>> t2 = xo.memtable({"a": [2, 3]})
>>> t2

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘

>>> t1.difference(t2)

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
└───────┘

distinct

distinct(on=None, keep='first')

Return a Table with duplicate rows removed.

Similar to pandas.DataFrame.drop_duplicates().

Some backends do not support keep='last'

Parameters

Name	Type	Description	Default
on	str \| Iterable[str] \| s.Selector \| None	Only consider certain columns for identifying duplicates. By default, deduplicate all of the columns.	`None`
keep	Literal['first', 'last'] \| None	Determines which duplicates to keep. - `"first"`: Drop duplicates except for the first occurrence. - `"last"`: Drop duplicates except for the last occurrence. - `None`: Drop all duplicates	`'first'`

Examples

>>> import xorq.api as xo
>>> import xorq.examples as ex
>>> import xorq.expr.selectors as s
>>> xo.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Compute the distinct rows of a subset of columns

>>> t[["species", "island"]].distinct().order_by(s.all())

┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species   ┃ island    ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string    │ string    │
├───────────┼───────────┤
│ Adelie    │ Biscoe    │
│ Adelie    │ Dream     │
│ Adelie    │ Torgersen │
│ Chinstrap │ Dream     │
│ Gentoo    │ Biscoe    │
└───────────┴───────────┘

Drop all duplicate rows except the first

>>> t.distinct(on=["species", "island"], keep="first").order_by(s.all())

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie    │ Biscoe    │           37.8 │          18.3 │             174.0 │      3400.0 │ female │  2007 │
│ Adelie    │ Dream     │           39.5 │          16.7 │             178.0 │      3250.0 │ female │  2007 │
│ Adelie    │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Chinstrap │ Dream     │           46.5 │          17.9 │             192.0 │      3500.0 │ female │  2007 │
│ Gentoo    │ Biscoe    │           46.1 │          13.2 │             211.0 │      4500.0 │ female │  2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicate rows except the last

>>> t.distinct(on=["species", "island"], keep="last").order_by(s.all())

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie    │ Biscoe    │           42.7 │          18.3 │             196.0 │      4075.0 │ male   │  2009 │
│ Adelie    │ Dream     │           41.5 │          18.5 │             201.0 │      4000.0 │ male   │  2009 │
│ Adelie    │ Torgersen │           43.1 │          19.2 │             197.0 │      3500.0 │ male   │  2009 │
│ Chinstrap │ Dream     │           50.2 │          18.7 │             198.0 │      3775.0 │ female │  2009 │
│ Gentoo    │ Biscoe    │           49.9 │          16.1 │             213.0 │      5400.0 │ male   │  2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicated rows

>>> expr = t.distinct(on=["species", "island", "year", "bill_length_mm"], keep=None)
>>> expr.count()

┌─────┐
│ 273 │
└─────┘

>>> t.count()

┌─────┐
│ 344 │
└─────┘

You can pass selectors to on

>>> t.distinct(on=~s.numeric())

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Chinstrap │ Dream     │           46.5 │          17.9 │             192.0 │      3500.0 │ female │  2007 │
│ Chinstrap │ Dream     │           50.0 │          19.5 │             196.0 │      3900.0 │ male   │  2007 │
│ Adelie    │ Biscoe    │           37.8 │          18.3 │             174.0 │      3400.0 │ female │  2007 │
│ Adelie    │ Biscoe    │           37.7 │          18.7 │             180.0 │      3600.0 │ male   │  2007 │
│ Adelie    │ Dream     │           39.5 │          16.7 │             178.0 │      3250.0 │ female │  2007 │
│ Adelie    │ Dream     │           37.2 │          18.1 │             178.0 │      3900.0 │ male   │  2007 │
│ Adelie    │ Dream     │           37.5 │          18.9 │             179.0 │      2975.0 │ NULL   │  2007 │
│ Adelie    │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie    │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie    │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ …         │ …         │              … │             … │                 … │           … │ …      │     … │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

The only valid values of keep are "first", "last" and .

>>> t.distinct(on="species", keep="second")

---------------------------------------------------------------------------
XorqError                                 Traceback (most recent call last)
Cell In[101], line 1
----> 1 t.distinct(on="species", keep="second")  

File ~/work/xorq/xorq/python/xorq/vendor/ibis/expr/types/relations.py:1165, in Table.distinct(self, on, keep)
   1163     method = keep
   1164 else:
-> 1165     raise com.XorqError(
   1166         f"Invalid value for `keep`: {keep!r}, must be 'first', 'last' or None"
   1167     )
   1169 aggs = {col.get_name(): getattr(col, method)() for col in (~on).expand(self)}
   1170 res = self.aggregate(aggs, by=on, having=having)

XorqError: Invalid value for `keep`: 'second', must be 'first', 'last' or None

drop

drop(*fields)

Remove fields from a table.

Parameters

Name	Type	Description	Default
fields	str \| Selector	Fields to drop. Strings and selectors are accepted.	`()`

Returns

Name	Type	Description
	Table	A table with all columns matching `fields` removed.

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop one or more columns

>>> t.drop("species").head()

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
└───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

>>> t.drop("species", "bill_length_mm").head()

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island    ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ float64       │ float64           │ float64     │ string │ int64 │
├───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Torgersen │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Torgersen │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Torgersen │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Torgersen │          19.3 │             193.0 │      3450.0 │ female │  2007 │
└───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop with selectors, mix and match

>>> import xorq.expr.selectors as s
>>> t.drop("species", s.startswith("bill_")).head()

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island    ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ float64           │ float64     │ string │ int64 │
├───────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │             181.0 │      3750.0 │ male   │  2007 │
│ Torgersen │             186.0 │      3800.0 │ female │  2007 │
│ Torgersen │             195.0 │      3250.0 │ female │  2007 │
│ Torgersen │              NULL │        NULL │ NULL   │  2007 │
│ Torgersen │             193.0 │      3450.0 │ female │  2007 │
└───────────┴───────────────────┴─────────────┴────────┴───────┘

drop_null

drop_null(subset=None, how='any')

Remove rows with null values from the table.

Parameters

Name	Type	Description	Default
subset	Sequence[str] \| str \| None	Columns names to consider when dropping nulls. By default all columns are considered.	`None`
how	Literal['any', 'all']	Determine whether a row is removed if there is at least one null value in the row (`'any'`), or if all row values are null (`'all'`).	`'any'`

Returns

Name	Type	Description
	Table	Table expression

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

>>> t.count()

┌─────┐
│ 344 │
└─────┘

>>> t.drop_null(["bill_length_mm", "body_mass_g"]).count()

┌─────┐
│ 342 │
└─────┘

>>> t.drop_null(how="all").count()  # no rows where all columns are null

┌─────┐
│ 344 │
└─────┘

dropna

dropna(subset=None, how='any')

Deprecated - use drop_null instead.

equals

equals(other)

Return whether this expression is structurally equivalent to other.

If you want to produce an equality expression, use == syntax.

Parameters

Name	Type	Description	Default
other		Another expression	required

Examples

>>> import xorq.api as xo
>>> t1 = xo.table(dict(a="int"), name="t")
>>> t2 = xo.table(dict(a="int"), name="t")
>>> t1.equals(t2)

True

>>> v = xo.table(dict(a="string"), name="v")
>>> t1.equals(v)

False

execute

execute(**kwargs)

Execute an expression against its backend if one exists.

Parameters

Name	Type	Description	Default
kwargs	Any	Keyword arguments	`{}`

Examples

>>> import xorq.api as xo
>>> t = xo.examples.penguins.fetch()
>>> t.execute()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	male	2007
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	female	2007
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	female	2007
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	None	2007
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	female	2007
...	...	...	...	...	...	...	...	...
339	Chinstrap	Dream	55.8	19.8	207.0	4000.0	male	2009
340	Chinstrap	Dream	43.5	18.1	202.0	3400.0	female	2009
341	Chinstrap	Dream	49.6	18.2	193.0	3775.0	male	2009
342	Chinstrap	Dream	50.8	19.0	210.0	4100.0	male	2009
343	Chinstrap	Dream	50.2	18.7	198.0	3775.0	female	2009

344 rows × 8 columns

Scalar parameters can be supplied dynamically during execution.

>>> species = xo.param("string")
>>> expr = t.filter(t.species == species).order_by(t.bill_length_mm)
>>> expr.execute(limit=3, params={species: "Gentoo"})

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
0	Gentoo	Biscoe	40.9	13.7	214.0	4650.0	female	2007
1	Gentoo	Biscoe	41.7	14.7	210.0	4700.0	female	2009
2	Gentoo	Biscoe	42.0	13.5	210.0	4150.0	female	2007

fill_null

fill_null(replacements)

Fill null values in a table expression.

There is potential lack of type stability with the fill_null API

For example, different library versions may impact whether a given backend promotes integer replacement values to floats.

Parameters

Name	Type	Description	Default
replacements	ir.Scalar \| Mapping[str, ir.Scalar]	Value with which to fill nulls. If `replacements` is a mapping, the keys are column names that map to their replacement value. If passed as a scalar all columns are filled with that value.	required

Returns

Name	Type	Description
	Table	Table expression

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.sex

┏━━━━━━━━┓
┃ sex    ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ male   │
│ female │
│ female │
│ NULL   │
│ female │
│ male   │
│ female │
│ male   │
│ NULL   │
│ NULL   │
│ …      │
└────────┘

>>> t.fill_null({"sex": "unrecorded"}).sex

┏━━━━━━━━━━━━┓
┃ sex        ┃
┡━━━━━━━━━━━━┩
│ string     │
├────────────┤
│ male       │
│ female     │
│ female     │
│ unrecorded │
│ female     │
│ male       │
│ female     │
│ male       │
│ unrecorded │
│ unrecorded │
│ …          │
└────────────┘

fillna

fillna(replacements)

Deprecated - use fill_null instead.

filter

filter(*predicates)

Select rows from table based on predicates.

Parameters

Name	Type	Description	Default
predicates	ir.BooleanValue \| Sequence[ir.BooleanValue] \| IfAnyAll	Boolean value expressions used to select rows in `table`.	`()`

Returns

Name	Type	Description
	Table	Filtered table expression

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

>>> t.filter([t.species == "Adelie", t.body_mass_g > 3500]).sex.value_counts().drop_null(
...     "sex"
... ).order_by("sex")

┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex    ┃ sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64     │
├────────┼───────────┤
│ female │        22 │
│ male   │        68 │
└────────┴───────────┘

get_name

get_name()

Return the fully qualified name of the table.

group_by

group_by(*by, **key_exprs)

Create a grouped table expression.

Similar to SQL’s GROUP BY statement, or pandas .groupby() method.

Parameters

Name	Type	Description	Default
by	str \| ir.Value \| Iterable[str] \| Iterable[ir.Value] \| None	Grouping expressions	`()`
key_exprs	str \| ir.Value \| Iterable[str] \| Iterable[ir.Value]	Named grouping expressions	`{}`

Returns

Name	Type	Description
	GroupedTable	A grouped table expression

Examples

>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.memtable(
...     {
...         "fruit": ["apple", "apple", "banana", "orange"],
...         "price": [0.5, 0.5, 0.25, 0.33],
...     }
... )
>>> t

┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit  ┃ price   ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────┤
│ apple  │    0.50 │
│ apple  │    0.50 │
│ banana │    0.25 │
│ orange │    0.33 │
└────────┴─────────┘

>>> t.group_by("fruit").agg(total_cost=_.price.sum(), avg_cost=_.price.mean()).order_by(
...     "fruit"
... )

┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit  ┃ total_cost ┃ avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64    │ float64  │
├────────┼────────────┼──────────┤
│ apple  │       1.00 │     0.50 │
│ banana │       0.25 │     0.25 │
│ orange │       0.33 │     0.33 │
└────────┴────────────┴──────────┘

has_name

has_name()

Check whether this expression has an explicit name.

head

head(n=5)

Select the first n rows of a table.

The result set is not deterministic without a call to order_by.

Parameters

Name	Type	Description	Default
n	int	Number of rows to include	`5`

Returns

Name	Type	Description
	Table	`self` limited to `n` rows

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t

┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ c      │
│     1 │ a      │
│     2 │ a      │
└───────┴────────┘

>>> t.head(2)

┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ c      │
│     1 │ a      │
└───────┴────────┘

info

info()

Return summary information about a table.

Returns

Name	Type	Description
	Table	Summary of `self`

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.info()

┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ name              ┃ type    ┃ nullable ┃ nulls ┃ non_nulls ┃ null_frac ┃ pos   ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string            │ string  │ boolean  │ int64 │ int64     │ float64   │ int16 │
├───────────────────┼─────────┼──────────┼───────┼───────────┼───────────┼───────┤
│ species           │ string  │ True     │     0 │       344 │  0.000000 │     0 │
│ island            │ string  │ True     │     0 │       344 │  0.000000 │     1 │
│ bill_length_mm    │ float64 │ True     │     2 │       342 │  0.005814 │     2 │
│ bill_depth_mm     │ float64 │ True     │     2 │       342 │  0.005814 │     3 │
│ flipper_length_mm │ float64 │ True     │     2 │       342 │  0.005814 │     4 │
│ body_mass_g       │ float64 │ True     │     2 │       342 │  0.005814 │     5 │
│ sex               │ string  │ True     │    11 │       333 │  0.031977 │     6 │
│ year              │ int64   │ True     │     0 │       344 │  0.000000 │     7 │
└───────────────────┴─────────┴──────────┴───────┴───────────┴───────────┴───────┘

intersect

intersect(table, *rest, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name	Type	Description	Default
table	Table	A table expression	required
*rest	Table	Additional table expressions	`()`
distinct	bool	Only return distinct rows	`True`

Returns

Name	Type	Description
	Table	A new table containing the intersection of all input tables.

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a": [1, 2]})
>>> t1

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘

>>> t2 = xo.memtable({"a": [2, 3]})
>>> t2

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘

>>> t1.intersect(t2)

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
└───────┘

into_backend

into_backend(con, name=None)

Converts the Expr to a table in the given backend con with an optional table name name.

The table is backed by a PyArrow RecordBatchReader, the RecordBatchReader is teed so it can safely be reaused without spilling to disk.

Parameters

Name	Type	Description	Default
con		The backend where the table should be created	required
name		The name of the table	`None`

Examples

>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> ls_con = xo.connect()
>>> pg_con = xo.postgres.connect_examples()
>>> t = pg_con.table("batting").into_backend(ls_con, "ls_batting")
>>> expr = (
...     t.join(t, "playerID")
...     .order_by("playerID", "yearID")
...     .limit(15)
...     .select(player_id="playerID", year_id="yearID_right")
... )
>>> expr

┏━━━━━━━━━━━┳━━━━━━━━━┓
┃ player_id ┃ year_id ┃
┡━━━━━━━━━━━╇━━━━━━━━━┩
│ string    │ int64   │
├───────────┼─────────┤
│ aardsda01 │    2015 │
│ aardsda01 │    2004 │
│ aardsda01 │    2006 │
│ aardsda01 │    2009 │
│ aardsda01 │    2008 │
│ aardsda01 │    2007 │
│ aardsda01 │    2012 │
│ aardsda01 │    2013 │
│ aardsda01 │    2010 │
│ aardsda01 │    2008 │
│ …         │       … │
└───────────┴─────────┘

join

join(left, right, predicates=(), how='inner', *, lname='', rname='{name}_right')

Perform a join between two tables.

Parameters

Name	Type	Description	Default
left	Table	Left table to join	required
right	Table	Right table to join	required
predicates	str \| Sequence[str \| ir.BooleanColumn \| Literal[True] \| Literal[False] \| tuple[str \| ir.Column \| ir.Deferred, str \| ir.Column \| ir.Deferred]]	Condition(s) to join on. See examples for details.	`()`
how	JoinKind	Join method, e.g. `"inner"` or `"left"`.	`'inner'`
lname	str	A format string to use to rename overlapping columns in the left table (e.g. `"left_{name}"`).	`''`
rname	str	A format string to use to rename overlapping columns in the right table (e.g. `"right_{name}"`).	`'{name}_right'`

Examples

>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> movies = xo.examples.ml_latest_small_movies.fetch()
>>> movies.head()

┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ title                              ┃ genres                                      ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64   │ string                             │ string                                      │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│       1 │ Toy Story (1995)                   │ Adventure|Animation|Children|Comedy|Fantasy │
│       2 │ Jumanji (1995)                     │ Adventure|Children|Fantasy                  │
│       3 │ Grumpier Old Men (1995)            │ Comedy|Romance                              │
│       4 │ Waiting to Exhale (1995)           │ Comedy|Drama|Romance                        │
│       5 │ Father of the Bride Part II (1995) │ Comedy                                      │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘

>>> ratings = xo.examples.ml_latest_small_ratings.fetch().drop("timestamp")
>>> ratings.head()

┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64  │ int64   │ float64 │
├────────┼─────────┼─────────┤
│      1 │       1 │     4.0 │
│      1 │       3 │     4.0 │
│      1 │       6 │     4.0 │
│      1 │      47 │     5.0 │
│      1 │      50 │     5.0 │
└────────┴─────────┴─────────┘

Equality left join on the shared movieId column. Note the _right suffix added to all overlapping columns from the right table (in this case only the “movieId” column).

>>> ratings.join(movies, "movieId", how="left").head(5)

┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating  ┃ movieId_right ┃ title                       ┃ genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64  │ int64   │ float64 │ int64         │ string                      │ string                                      │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      1 │       1 │     4.0 │             1 │ Toy Story (1995)            │ Adventure|Animation|Children|Comedy|Fantasy │
│      1 │       3 │     4.0 │             3 │ Grumpier Old Men (1995)     │ Comedy|Romance                              │
│      1 │       6 │     4.0 │             6 │ Heat (1995)                 │ Action|Crime|Thriller                       │
│      1 │      47 │     5.0 │            47 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller                            │
│      1 │      50 │     5.0 │            50 │ Usual Suspects, The (1995)  │ Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘

Explicit equality join using the default how value of "inner". Note how there is no _right suffix added to the movieId column since this is an inner join and the movieId column is part of the join condition.

>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5)

┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating  ┃ title                       ┃ genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64  │ int64   │ float64 │ string                      │ string                                      │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      1 │       1 │     4.0 │ Toy Story (1995)            │ Adventure|Animation|Children|Comedy|Fantasy │
│      1 │       3 │     4.0 │ Grumpier Old Men (1995)     │ Comedy|Romance                              │
│      1 │       6 │     4.0 │ Heat (1995)                 │ Action|Crime|Thriller                       │
│      1 │      47 │     5.0 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller                            │
│      1 │      50 │     5.0 │ Usual Suspects, The (1995)  │ Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘

>>> tags = xo.examples.ml_latest_small_tags.fetch()
>>> tags.head()

┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag             ┃ timestamp  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64  │ int64   │ string          │ int64      │
├────────┼─────────┼─────────────────┼────────────┤
│      2 │   60756 │ funny           │ 1445714994 │
│      2 │   60756 │ Highly quotable │ 1445714996 │
│      2 │   60756 │ will ferrell    │ 1445714992 │
│      2 │   89774 │ Boxing story    │ 1445715207 │
│      2 │   89774 │ MMA             │ 1445715200 │
└────────┴─────────┴─────────────────┴────────────┘

You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:

>>> tags.join(ratings, ["userId", "movieId"]).head(5).order_by("userId")

┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag            ┃ timestamp  ┃ rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64  │ int64   │ string         │ int64      │ float64 │
├────────┼─────────┼────────────────┼────────────┼─────────┤
│     62 │       2 │ Robin Williams │ 1528843907 │     4.0 │
│     62 │     110 │ sword fight    │ 1528152535 │     4.5 │
│     62 │     410 │ gothic         │ 1525636609 │     4.5 │
│     62 │    2023 │ mafia          │ 1525636733 │     5.0 │
│     62 │    2124 │ quirky         │ 1525636846 │     5.0 │
└────────┴─────────┴────────────────┴────────────┴─────────┘

To self-join a table with itself, you need to call .view() on one of the arguments so the two tables are distinct from each other.

For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key}), where each key can be

a Column
Deferred expression
lambda of the form (Table) -> Column

For example, to find all movies pairings that received the same (ignoring case) tags:

>>> movie_tags = tags["movieId", "tag"]
>>> view = movie_tags.view()
>>> movie_tags.join(
...     view,
...     [
...         movie_tags.movieId != view.movieId,
...         (_["tag"].lower(), lambda t: t["tag"].lower()),
...     ],
... ).head().order_by(("movieId", "movieId_right"))

┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ tag               ┃ movieId_right ┃ tag_right         ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64   │ string            │ int64         │ string            │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│   60756 │ funny             │          1732 │ funny             │
│   60756 │ Highly quotable   │          1732 │ Highly quotable   │
│   89774 │ Tom Hardy         │        139385 │ tom hardy         │
│  106782 │ drugs             │          1732 │ drugs             │
│  106782 │ Leonardo DiCaprio │          5989 │ Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘

limit

limit(n, offset=0)

Select n rows from self starting at offset.

The result set is not deterministic without a call to order_by.

Parameters

Name	Type	Description	Default
n	int \| None	Number of rows to include. If `None`, the entire table is selected starting from `offset`.	required
offset	int	Number of rows to skip first	`0`

Returns

Name	Type	Description
	Table	The first `n` rows of `self` starting at `offset`

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t

┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ c      │
│     1 │ a      │
│     2 │ a      │
└───────┴────────┘

>>> t.limit(2)

┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ c      │
│     1 │ a      │
└───────┴────────┘

You can use None with offset to slice starting from a particular row

>>> t.limit(None, offset=1)

┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ a      │
│     2 │ a      │
└───────┴────────┘

mutate

mutate(*exprs, **mutations)

Add columns to a table expression.

Parameters

Name	Type	Description	Default
exprs	Sequence[ir.Expr] \| None	List of named expressions to add as columns	`()`
mutations	ir.Value	Named expressions using keyword arguments	`{}`

Returns

Name	Type	Description
	Table	Table expression with additional columns

Examples

>>> import xorq.api as xo
>>> import xorq.expr.selectors as s
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False).select("species", "year", "bill_length_mm")
>>> t

┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ year  ┃ bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string  │ int64 │ float64        │
├─────────┼───────┼────────────────┤
│ Adelie  │  2007 │           39.1 │
│ Adelie  │  2007 │           39.5 │
│ Adelie  │  2007 │           40.3 │
│ Adelie  │  2007 │           NULL │
│ Adelie  │  2007 │           36.7 │
│ Adelie  │  2007 │           39.3 │
│ Adelie  │  2007 │           38.9 │
│ Adelie  │  2007 │           39.2 │
│ Adelie  │  2007 │           34.1 │
│ Adelie  │  2007 │           42.0 │
│ …       │     … │              … │
└─────────┴───────┴────────────────┘

Add a new column from a per-element expression

>>> t.mutate(next_year=_.year + 1).head()

┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ year  ┃ bill_length_mm ┃ next_year ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string  │ int64 │ float64        │ int64     │
├─────────┼───────┼────────────────┼───────────┤
│ Adelie  │  2007 │           39.1 │      2008 │
│ Adelie  │  2007 │           39.5 │      2008 │
│ Adelie  │  2007 │           40.3 │      2008 │
│ Adelie  │  2007 │           NULL │      2008 │
│ Adelie  │  2007 │           36.7 │      2008 │
└─────────┴───────┴────────────────┴───────────┘

Add a new column based on an aggregation. Note the automatic broadcasting.

>>> t.select("species", bill_demean=_.bill_length_mm - _.bill_length_mm.mean()).head()

┏━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ species ┃ bill_demean ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━┩
│ string  │ float64     │
├─────────┼─────────────┤
│ Adelie  │    -4.82193 │
│ Adelie  │    -4.42193 │
│ Adelie  │    -3.62193 │
│ Adelie  │        NULL │
│ Adelie  │    -7.22193 │
└─────────┴─────────────┘

Mutate across multiple columns

>>> t.mutate(s.across(s.numeric() & ~s.cols("year"), _ - _.mean())).head()

┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ year  ┃ bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string  │ int64 │ float64        │
├─────────┼───────┼────────────────┤
│ Adelie  │  2007 │       -4.82193 │
│ Adelie  │  2007 │       -4.42193 │
│ Adelie  │  2007 │       -3.62193 │
│ Adelie  │  2007 │           NULL │
│ Adelie  │  2007 │       -7.22193 │
└─────────┴───────┴────────────────┘

nunique

nunique(where=None)

Compute the number of unique rows in the table.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Optional boolean expression to filter rows when counting.	`None`

Returns

Name	Type	Description
	IntegerScalar	Number of unique rows in the table

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a": ["foo", "bar", "bar"]})
>>> t

┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ bar    │
└────────┘

>>> t.nunique()

Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'CountDistinctStar' operation is not defined")
Expression repr follows:
r0 := InMemoryTable
  data:
    PandasDataFrameProxy:
           a
      0  foo
      1  bar
      2  bar

CountDistinctStar(ibis_pandas_memtable_brbsa5uceffohc7ajjt7sbd3im): CountDistinctStar(r0)

>>> t.nunique(t.a != "foo")

Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'CountDistinctStar' operation is not defined")
Expression repr follows:
r0 := InMemoryTable
  data:
    PandasDataFrameProxy:
           a
      0  foo
      1  bar
      2  bar

CountDistinctStar(ibis_pandas_memtable_brbsa5uceffohc7ajjt7sbd3im, NotEquals(a, 'foo')): CountDistinctStar(r0, wher

order_by

order_by(*by)

Sort a table by one or more expressions.

Similar to pandas.DataFrame.sort_values().

Parameters

Name	Type	Description	Default
by	str \| ir.Column \| s.Selector \| Sequence[str] \| Sequence[ir.Column] \| Sequence[s.Selector] \| None	Expressions to sort the table by.	`()`

Returns

Name	Type	Description
	Table	Sorted table

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable(
...     {
...         "a": [3, 2, 1, 3],
...         "b": ["a", "B", "c", "D"],
...         "c": [4, 6, 5, 7],
...     }
... )
>>> t

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     3 │ a      │     4 │
│     2 │ B      │     6 │
│     1 │ c      │     5 │
│     3 │ D      │     7 │
└───────┴────────┴───────┘

Sort by b. Default is ascending. Note how capital letters come before lowercase

>>> t.order_by("b")

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     2 │ B      │     6 │
│     3 │ D      │     7 │
│     3 │ a      │     4 │
│     1 │ c      │     5 │
└───────┴────────┴───────┘

Sort in descending order

>>> t.order_by(xo.desc("b"))

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     1 │ c      │     5 │
│     3 │ a      │     4 │
│     3 │ D      │     7 │
│     2 │ B      │     6 │
└───────┴────────┴───────┘

You can also use the deferred API to get the same result

>>> from xorq.api import _
>>> t.order_by(_.b.desc())

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     1 │ c      │     5 │
│     3 │ a      │     4 │
│     3 │ D      │     7 │
│     2 │ B      │     6 │
└───────┴────────┴───────┘

Sort by multiple columns/expressions

>>> t.order_by(["a", _.c.desc()])

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     1 │ c      │     5 │
│     2 │ B      │     6 │
│     3 │ D      │     7 │
│     3 │ a      │     4 │
└───────┴────────┴───────┘

You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b

>>> t.order_by(_.b.lower())

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     3 │ a      │     4 │
│     2 │ B      │     6 │
│     1 │ c      │     5 │
│     3 │ D      │     7 │
└───────┴────────┴───────┘

This means that shuffling a Table is super simple

>>> t.order_by(xo.random())

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     3 │ D      │     7 │
│     1 │ c      │     5 │
│     2 │ B      │     6 │
│     3 │ a      │     4 │
└───────┴────────┴───────┘

Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria

>>> import xorq.expr.selectors as s
>>> penguins = xo.examples.penguins.fetch(deferred=False)
>>> penguins[["year", "island"]].value_counts().order_by(s.startswith("year"))

┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year  ┃ island    ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string    │ int64             │
├───────┼───────────┼───────────────────┤
│  2007 │ Torgersen │                20 │
│  2007 │ Biscoe    │                44 │
│  2007 │ Dream     │                46 │
│  2008 │ Torgersen │                16 │
│  2008 │ Dream     │                34 │
│  2008 │ Biscoe    │                64 │
│  2009 │ Torgersen │                16 │
│  2009 │ Dream     │                44 │
│  2009 │ Biscoe    │                60 │
└───────┴───────────┴───────────────────┘

Use the across selector to apply a specific order to multiple columns

>>> penguins[["year", "island"]].value_counts().order_by(
...     s.across(s.startswith("year"), _.desc())
... )

┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year  ┃ island    ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string    │ int64             │
├───────┼───────────┼───────────────────┤
│  2009 │ Biscoe    │                60 │
│  2009 │ Dream     │                44 │
│  2009 │ Torgersen │                16 │
│  2008 │ Biscoe    │                64 │
│  2008 │ Dream     │                34 │
│  2008 │ Torgersen │                16 │
│  2007 │ Dream     │                46 │
│  2007 │ Biscoe    │                44 │
│  2007 │ Torgersen │                20 │
└───────┴───────────┴───────────────────┘

pipe

pipe(f, *args, **kwargs)

Compose f with self.

Parameters

Name	Type	Description	Default
f		If the expression needs to be passed as anything other than the first argument to the function, pass a tuple with the argument name. For example, (f, ‘data’) if the function f expects a ‘data’ keyword	required
args	Any	Positional arguments to `f`	`()`
kwargs	Any	Keyword arguments to `f`	`{}`

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = False
>>> t = xo.table([("a", "int64"), ("b", "string")], name="t")
>>> f = lambda a: (a + 1).name("a")
>>> g = lambda a: (a * 2).name("a")
>>> result1 = t.a.pipe(f).pipe(g)
>>> result1

r0 := UnboundTable: t
  a int64
  b string

a: r0.a + 1 * 2

>>> result2 = g(f(t.a))  # equivalent to the above
>>> result1.equals(result2)

True

Returns

Name	Type	Description
	Expr	Result type of passed function

pivot_longer

pivot_longer(
    col,
    *,
    names_to='name',
    names_pattern='(.+)',
    names_transform=None,
    values_to='value',
    values_transform=None,
)

Transform a table from wider to longer.

Parameters

Name	Type	Description	Default
col	str \| s.Selector	String column name or selector.	required
names_to	str \| Iterable[str]	A string or iterable of strings indicating how to name the new pivoted columns.	`'name'`
names_pattern	str \| re.Pattern	Pattern to use to extract column names from the input. By default the entire column name is extracted.	`'(.+)'`
names_transform	Callable[[str], ir.Value] \| Mapping[str, Callable[[str], ir.Value]] \| None	Function or mapping of a name in `names_to` to a function to transform a column name to a value.	`None`
values_to	str	Name of the pivoted value column.	`'value'`
values_transform	Callable[[ir.Value], ir.Value] \| Deferred \| None	Apply a function to the value column. This can be a lambda or deferred expression.	`None`

Returns

Name	Type	Description
	Table	Pivoted table

Examples

Basic usage

>>> import xorq.api as xo
>>> import xorq.expr.selectors as s
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> relig_income = xo.examples.relig_income_raw.fetch()
>>> relig_income

┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ religion                ┃ <$10k ┃ $10-20k ┃ $20-30k ┃ $30-40k ┃ $40-50k ┃ $50-75k ┃ $75-100k ┃ $100-150k ┃ >150k ┃ Don't know/refused ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ string                  │ int64 │ int64   │ int64   │ int64   │ int64   │ int64   │ int64    │ int64     │ int64 │ int64              │
├─────────────────────────┼───────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼───────────┼───────┼────────────────────┤
│ Agnostic                │    27 │      34 │      60 │      81 │      76 │     137 │      122 │       109 │    84 │                 96 │
│ Atheist                 │    12 │      27 │      37 │      52 │      35 │      70 │       73 │        59 │    74 │                 76 │
│ Buddhist                │    27 │      21 │      30 │      34 │      33 │      58 │       62 │        39 │    53 │                 54 │
│ Catholic                │   418 │     617 │     732 │     670 │     638 │    1116 │      949 │       792 │   633 │               1489 │
│ Don’t know/refused      │    15 │      14 │      15 │      11 │      10 │      35 │       21 │        17 │    18 │                116 │
│ Evangelical Prot        │   575 │     869 │    1064 │     982 │     881 │    1486 │      949 │       723 │   414 │               1529 │
│ Hindu                   │     1 │       9 │       7 │       9 │      11 │      34 │       47 │        48 │    54 │                 37 │
│ Historically Black Prot │   228 │     244 │     236 │     238 │     197 │     223 │      131 │        81 │    78 │                339 │
│ Jehovah's Witness       │    20 │      27 │      24 │      24 │      21 │      30 │       15 │        11 │     6 │                 37 │
│ Jewish                  │    19 │      19 │      25 │      25 │      30 │      95 │       69 │        87 │   151 │                162 │
│ …                       │     … │       … │       … │       … │       … │       … │        … │         … │     … │                  … │
└─────────────────────────┴───────┴─────────┴─────────┴─────────┴─────────┴─────────┴──────────┴───────────┴───────┴────────────────────┘

Here we convert column names not matching the selector for the religion column and convert those names into values

>>> relig_income.pivot_longer(~s.cols("religion"), names_to="income", values_to="count")

┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ religion ┃ income             ┃ count ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ string   │ string             │ int64 │
├──────────┼────────────────────┼───────┤
│ Agnostic │ <$10k              │    27 │
│ Agnostic │ $10-20k            │    34 │
│ Agnostic │ $20-30k            │    60 │
│ Agnostic │ $30-40k            │    81 │
│ Agnostic │ $40-50k            │    76 │
│ Agnostic │ $50-75k            │   137 │
│ Agnostic │ $75-100k           │   122 │
│ Agnostic │ $100-150k          │   109 │
│ Agnostic │ >150k              │    84 │
│ Agnostic │ Don't know/refused │    96 │
│ …        │ …                  │     … │
└──────────┴────────────────────┴───────┘

Similarly for a different example dataset, we convert names to values but using a different selector and the default values_to value.

>>> world_bank_pop = xo.examples.world_bank_pop_raw.fetch()
>>> world_bank_pop.head()

┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ country ┃ indicator   ┃ 2000         ┃ 2001         ┃ 2002         ┃ 2003         ┃ 2004         ┃ 2005         ┃ 2006         ┃ 2007         ┃ 2008         ┃ 2009         ┃ 2010         ┃ 2011         ┃ 2012         ┃ 2013         ┃ 2014         ┃ 2015         ┃ 2016         ┃ 2017         ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ string  │ string      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │ float64      │
├─────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ ABW     │ SP.URB.TOTL │ 4.162500e+04 │ 4.202500e+04 │ 4.219400e+04 │ 4.227700e+04 │ 4.231700e+04 │ 4.239900e+04 │ 4.255500e+04 │ 4.272900e+04 │ 4.290600e+04 │ 4.307900e+04 │ 4.320600e+04 │ 4.349300e+04 │ 4.386400e+04 │ 4.422800e+04 │ 4.458800e+04 │ 4.494300e+04 │ 4.529700e+04 │ 4.564800e+04 │
│ ABW     │ SP.URB.GROW │ 1.664222e+00 │ 9.563731e-01 │ 4.013352e-01 │ 1.965172e-01 │ 9.456936e-02 │ 1.935880e-01 │ 3.672580e-01 │ 4.080490e-01 │ 4.133830e-01 │ 4.023963e-01 │ 2.943735e-01 │ 6.620631e-01 │ 8.493932e-01 │ 8.264135e-01 │ 8.106692e-01 │ 7.930256e-01 │ 7.845785e-01 │ 7.718989e-01 │
│ ABW     │ SP.POP.TOTL │ 8.910100e+04 │ 9.069100e+04 │ 9.178100e+04 │ 9.270100e+04 │ 9.354000e+04 │ 9.448300e+04 │ 9.560600e+04 │ 9.678700e+04 │ 9.799600e+04 │ 9.921200e+04 │ 1.003410e+05 │ 1.012880e+05 │ 1.021120e+05 │ 1.028800e+05 │ 1.035940e+05 │ 1.042570e+05 │ 1.048740e+05 │ 1.054390e+05 │
│ ABW     │ SP.POP.GROW │ 2.539234e+00 │ 1.768757e+00 │ 1.194718e+00 │ 9.973955e-01 │ 9.009892e-01 │ 1.003077e+00 │ 1.181566e+00 │ 1.227711e+00 │ 1.241397e+00 │ 1.233231e+00 │ 1.131541e+00 │ 9.393559e-01 │ 8.102306e-01 │ 7.493010e-01 │ 6.916153e-01 │ 6.379592e-01 │ 5.900625e-01 │ 5.372957e-01 │
│ AFE     │ SP.URB.TOTL │ 1.155517e+08 │ 1.197755e+08 │ 1.242275e+08 │ 1.288340e+08 │ 1.336475e+08 │ 1.387456e+08 │ 1.440267e+08 │ 1.492313e+08 │ 1.553838e+08 │ 1.617762e+08 │ 1.684561e+08 │ 1.754157e+08 │ 1.825587e+08 │ 1.901087e+08 │ 1.980733e+08 │ 2.065563e+08 │ 2.150833e+08 │ 2.237321e+08 │
└─────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘

>>> world_bank_pop.pivot_longer(s.matches(r"\d{4}"), names_to="year").head()

┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ country ┃ indicator   ┃ year   ┃ value   ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ string  │ string      │ string │ float64 │
├─────────┼─────────────┼────────┼─────────┤
│ ABW     │ SP.URB.TOTL │ 2000   │ 41625.0 │
│ ABW     │ SP.URB.TOTL │ 2001   │ 42025.0 │
│ ABW     │ SP.URB.TOTL │ 2002   │ 42194.0 │
│ ABW     │ SP.URB.TOTL │ 2003   │ 42277.0 │
│ ABW     │ SP.URB.TOTL │ 2004   │ 42317.0 │
└─────────┴─────────────┴────────┴─────────┘

pivot_longer has some preprocessing capabilities like stripping a prefix and applying a function to column names

>>> billboard = xo.examples.billboard.fetch()
>>> billboard

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ artist         ┃ track                   ┃ date_entered ┃ wk1   ┃ wk2   ┃ wk3   ┃ wk4   ┃ wk5   ┃ wk6   ┃ wk7   ┃ wk8   ┃ wk9   ┃ wk10  ┃ wk11  ┃ wk12  ┃ wk13  ┃ wk14  ┃ wk15  ┃ wk16  ┃ wk17  ┃ wk18  ┃ wk19  ┃ wk20  ┃ wk21  ┃ wk22  ┃ wk23  ┃ wk24  ┃ wk25  ┃ wk26  ┃ wk27  ┃ wk28  ┃ wk29  ┃ wk30  ┃ wk31  ┃ wk32  ┃ wk33  ┃ wk34  ┃ wk35  ┃ wk36  ┃ wk37  ┃ wk38  ┃ wk39  ┃ wk40  ┃ wk41  ┃ wk42  ┃ wk43  ┃ wk44  ┃ wk45  ┃ wk46  ┃ wk47  ┃ wk48  ┃ wk49  ┃ wk50  ┃ wk51  ┃ wk52  ┃ wk53  ┃ wk54  ┃ wk55  ┃ wk56  ┃ wk57  ┃ wk58  ┃ wk59  ┃ wk60  ┃ wk61  ┃ wk62  ┃ wk63  ┃ wk64  ┃ wk65  ┃ wk66   ┃ wk67   ┃ wk68   ┃ wk69   ┃ wk70   ┃ wk71   ┃ wk72   ┃ wk73   ┃ wk74   ┃ wk75   ┃ wk76   ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string         │ string                  │ date         │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ string │ string │ string │ string │ string │ string │ string │ string │ string │ string │ string │
├────────────────┼─────────────────────────┼──────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│ 2 Pac          │ Baby Don't Cry (Keep... │ 2000-02-26   │    87 │    82 │    72 │    77 │    87 │    94 │    99 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ 2Ge+her        │ The Hardest Part Of ... │ 2000-09-02   │    91 │    87 │    92 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ 3 Doors Down   │ Kryptonite              │ 2000-04-08   │    81 │    70 │    68 │    67 │    66 │    57 │    54 │    53 │    51 │    51 │    51 │    51 │    47 │    44 │    38 │    28 │    22 │    18 │    18 │    14 │    12 │     7 │     6 │     6 │     6 │     5 │     5 │     4 │     4 │     4 │     4 │     3 │     3 │     3 │     4 │     5 │     5 │     9 │     9 │    15 │    14 │    13 │    14 │    16 │    17 │    21 │    22 │    24 │    28 │    33 │    42 │    42 │    49 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ 3 Doors Down   │ Loser                   │ 2000-10-21   │    76 │    76 │    72 │    69 │    67 │    65 │    55 │    59 │    62 │    61 │    61 │    59 │    61 │    66 │    72 │    76 │    75 │    67 │    73 │    70 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ 504 Boyz       │ Wobble Wobble           │ 2000-04-15   │    57 │    34 │    25 │    17 │    17 │    31 │    36 │    49 │    53 │    57 │    64 │    70 │    75 │    76 │    78 │    85 │    92 │    96 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ 98^0           │ Give Me Just One Nig... │ 2000-08-19   │    51 │    39 │    34 │    26 │    26 │    19 │     2 │     2 │     3 │     6 │     7 │    22 │    29 │    36 │    47 │    67 │    66 │    84 │    93 │    94 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ A*Teens        │ Dancing Queen           │ 2000-07-08   │    97 │    97 │    96 │    95 │   100 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ Aaliyah        │ I Don't Wanna           │ 2000-01-29   │    84 │    62 │    51 │    41 │    38 │    35 │    35 │    38 │    38 │    36 │    37 │    37 │    38 │    49 │    61 │    63 │    62 │    67 │    83 │    86 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ Aaliyah        │ Try Again               │ 2000-03-18   │    59 │    53 │    38 │    28 │    21 │    18 │    16 │    14 │    12 │    10 │     9 │     8 │     6 │     1 │     2 │     2 │     2 │     2 │     3 │     4 │     5 │     5 │     6 │     9 │    13 │    14 │    16 │    23 │    22 │    33 │    36 │    43 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ Adams, Yolanda │ Open My Heart           │ 2000-08-26   │    76 │    76 │    74 │    69 │    68 │    67 │    61 │    58 │    57 │    59 │    66 │    68 │    61 │    67 │    59 │    63 │    67 │    71 │    79 │    89 │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │  NULL │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │ NULL   │
│ …              │ …                       │ …            │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │     … │ …      │ …      │ …      │ …      │ …      │ …      │ …      │ …      │ …      │ …      │ …      │
└────────────────┴─────────────────────────┴──────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘

>>> billboard.pivot_longer(
...     s.startswith("wk"),
...     names_to="week",
...     names_pattern=r"wk(.+)",
...     names_transform=int,
...     values_to="rank",
...     values_transform=_.cast("int"),
... ).drop_null("rank")

┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ artist  ┃ track                   ┃ date_entered ┃ week ┃ rank  ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ string  │ string                  │ date         │ int8 │ int64 │
├─────────┼─────────────────────────┼──────────────┼──────┼───────┤
│ 2 Pac   │ Baby Don't Cry (Keep... │ 2000-02-26   │    1 │    87 │
│ 2 Pac   │ Baby Don't Cry (Keep... │ 2000-02-26   │    2 │    82 │
│ 2 Pac   │ Baby Don't Cry (Keep... │ 2000-02-26   │    3 │    72 │
│ 2 Pac   │ Baby Don't Cry (Keep... │ 2000-02-26   │    4 │    77 │
│ 2 Pac   │ Baby Don't Cry (Keep... │ 2000-02-26   │    5 │    87 │
│ 2 Pac   │ Baby Don't Cry (Keep... │ 2000-02-26   │    6 │    94 │
│ 2 Pac   │ Baby Don't Cry (Keep... │ 2000-02-26   │    7 │    99 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02   │    1 │    91 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02   │    2 │    87 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02   │    3 │    92 │
│ …       │ …                       │ …            │    … │     … │
└─────────┴─────────────────────────┴──────────────┴──────┴───────┘

You can use regular expression capture groups to extract multiple variables stored in column names

>>> who = xo.examples.who.fetch()
>>> who

┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ country     ┃ iso2   ┃ iso3   ┃ year  ┃ new_sp_m014 ┃ new_sp_m1524 ┃ new_sp_m2534 ┃ new_sp_m3544 ┃ new_sp_m4554 ┃ new_sp_m5564 ┃ new_sp_m65 ┃ new_sp_f014 ┃ new_sp_f1524 ┃ new_sp_f2534 ┃ new_sp_f3544 ┃ new_sp_f4554 ┃ new_sp_f5564 ┃ new_sp_f65 ┃ new_sn_m014 ┃ new_sn_m1524 ┃ new_sn_m2534 ┃ new_sn_m3544 ┃ new_sn_m4554 ┃ new_sn_m5564 ┃ new_sn_m65 ┃ new_sn_f014 ┃ new_sn_f1524 ┃ new_sn_f2534 ┃ new_sn_f3544 ┃ new_sn_f4554 ┃ new_sn_f5564 ┃ new_sn_f65 ┃ new_ep_m014 ┃ new_ep_m1524 ┃ new_ep_m2534 ┃ new_ep_m3544 ┃ new_ep_m4554 ┃ new_ep_m5564 ┃ new_ep_m65 ┃ new_ep_f014 ┃ new_ep_f1524 ┃ new_ep_f2534 ┃ new_ep_f3544 ┃ new_ep_f4554 ┃ new_ep_f5564 ┃ new_ep_f65 ┃ newrel_m014 ┃ newrel_m1524 ┃ newrel_m2534 ┃ newrel_m3544 ┃ newrel_m4554 ┃ newrel_m5564 ┃ newrel_m65 ┃ newrel_f014 ┃ newrel_f1524 ┃ newrel_f2534 ┃ newrel_f3544 ┃ newrel_f4554 ┃ newrel_f5564 ┃ newrel_f65 ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ string      │ string │ string │ int64 │ int64       │ int64        │ int64        │ int64        │ int64        │ int64        │ int64      │ int64       │ int64        │ int64        │ int64        │ int64        │ int64        │ int64      │ int64       │ int64        │ int64        │ int64        │ int64        │ int64        │ int64      │ int64       │ int64        │ int64        │ int64        │ int64        │ int64        │ int64      │ int64       │ int64        │ int64        │ int64        │ int64        │ int64        │ int64      │ int64       │ int64        │ int64        │ int64        │ int64        │ int64        │ int64      │ int64       │ int64        │ int64        │ int64        │ int64        │ int64        │ int64      │ int64       │ int64        │ int64        │ int64        │ int64        │ int64        │ int64      │
├─────────────┼────────┼────────┼───────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┤
│ Afghanistan │ AF     │ AFG    │  1980 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1981 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1982 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1983 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1984 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1985 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1986 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1987 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1988 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ Afghanistan │ AF     │ AFG    │  1989 │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │        NULL │         NULL │         NULL │         NULL │         NULL │         NULL │       NULL │
│ …           │ …      │ …      │     … │           … │            … │            … │            … │            … │            … │          … │           … │            … │            … │            … │            … │            … │          … │           … │            … │            … │            … │            … │            … │          … │           … │            … │            … │            … │            … │            … │          … │           … │            … │            … │            … │            … │            … │          … │           … │            … │            … │            … │            … │            … │          … │           … │            … │            … │            … │            … │            … │          … │           … │            … │            … │            … │            … │            … │          … │
└─────────────┴────────┴────────┴───────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┘

>>> len(who.columns)

>>> who.pivot_longer(
...     s.index["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_(.)(.*)",
...     values_to="count",
... )

┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ country     ┃ iso2   ┃ iso3   ┃ year  ┃ diagnosis ┃ gender ┃ age    ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string      │ string │ string │ int64 │ string    │ string │ string │ int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼────────┼───────┤
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ m      │ 014    │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ m      │ 1524   │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ m      │ 2534   │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ m      │ 3544   │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ m      │ 4554   │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ m      │ 5564   │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ m      │ 65     │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ f      │ 014    │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ f      │ 1524   │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │ f      │ 2534   │  NULL │
│ …           │ …      │ …      │     … │ …         │ …      │ …      │     … │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴────────┴───────┘

names_transform is flexible, and can be:

1. A mapping of one or more names in `names_to` to callable
2. A callable that will be applied to every name

Let’s recode gender and age to numeric values using a mapping

>>> who.pivot_longer(
...     s.index["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_(.)(.*)",
...     names_transform=dict(
...         gender={"m": 1, "f": 2}.get,
...         age=dict(
...             zip(
...                 ["014", "1524", "2534", "3544", "4554", "5564", "65"],
...                 range(7),
...             )
...         ).get,
...     ),
...     values_to="count",
... )

┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ country     ┃ iso2   ┃ iso3   ┃ year  ┃ diagnosis ┃ gender ┃ age  ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ string      │ string │ string │ int64 │ string    │ int8   │ int8 │ int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼──────┼───────┤
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      1 │    0 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      1 │    1 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      1 │    2 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      1 │    3 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      1 │    4 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      1 │    5 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      1 │    6 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      2 │    0 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      2 │    1 │  NULL │
│ Afghanistan │ AF     │ AFG    │  1980 │ sp        │      2 │    2 │  NULL │
│ …           │ …      │ …      │     … │ …         │      … │    … │     … │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴──────┴───────┘

The number of match groups in names_pattern must match the length of names_to

>>> who.pivot_longer(  
...     s.index["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_.(.*)",
... )

---------------------------------------------------------------------------
XorqInputError                            Traceback (most recent call last)
Cell In[293], line 1
----> 1 who.pivot_longer(  
      2     s.index["new_sp_m014":"newrel_f65"],
      3     names_to=["diagnosis", "gender", "age"],
      4     names_pattern="new_?(.*)_.(.*)",
      5 )

File ~/work/xorq/xorq/python/xorq/vendor/ibis/expr/types/relations.py:3726, in Table.pivot_longer(self, col, names_to, names_pattern, names_transform, values_to, values_transform)
   3724 names_pattern = re.compile(names_pattern)
   3725 if (ngroups := names_pattern.groups) != (nnames := len(names_to)):
-> 3726     raise com.XorqInputError(
   3727         f"Number of match groups in `names_pattern`"
   3728         f"{names_pattern.pattern!r} ({ngroups:d} groups) doesn't "
   3729         f"match the length of `names_to` {names_to} (length {nnames:d})"
   3730     )
   3732 if names_transform is None:
   3733     names_transform = dict.fromkeys(names_to, toolz.identity)

XorqInputError: Number of match groups in `names_pattern`'new_?(.*)_.(.*)' (2 groups) doesn't match the length of `names_to` ['diagnosis', 'gender', 'age'] (length 3)

names_transform must be a mapping or callable

>>> who.pivot_longer(
...     s.index["new_sp_m014":"newrel_f65"], names_transform="upper"
... )  # quartodoc: +EXPECTED_FAILURE

---------------------------------------------------------------------------
XorqTypeError                             Traceback (most recent call last)
Cell In[296], line 1
----> 1 who.pivot_longer(
      2     s.index["new_sp_m014":"newrel_f65"], names_transform="upper"
      3 )  # quartodoc: +EXPECTED_FAILURE

File ~/work/xorq/xorq/python/xorq/vendor/ibis/expr/types/relations.py:3738, in Table.pivot_longer(self, col, names_to, names_pattern, names_transform, values_to, values_transform)
   3736         names_transform = dict.fromkeys(names_to, names_transform)
   3737     else:
-> 3738         raise com.XorqTypeError(
   3739             f"`names_transform` must be a mapping or callable. Got {type(names_transform)}"
   3740         )
   3742 for name in names_to:
   3743     names_transform.setdefault(name, toolz.identity)

XorqTypeError: `names_transform` must be a mapping or callable. Got <class 'str'>

pivot_wider

pivot_wider(
    id_cols=None,
    names_from='name',
    names_prefix='',
    names_sep='_',
    names_sort=False,
    names=None,
    values_from='value',
    values_fill=None,
    values_agg='arbitrary',
)

Pivot a table to a wider format.

Parameters

Name	Type	Description	Default
id_cols	s.Selector \| None	A set of columns that uniquely identify each observation.	`None`
names_from	str \| Iterable[str] \| s.Selector	An argument describing which column or columns to use to get the name of the output columns.	`'name'`
names_prefix	str	String added to the start of every column name.	`''`
names_sep	str	If `names_from` or `values_from` contains multiple columns, this argument will be used to join their values together into a single string to use as a column name.	`'_'`
names_sort	bool	If columns are sorted. If column names are ordered by appearance.	`False`
names	Iterable[str] \| None	An explicit sequence of values to look for in columns matching `names_from`. * When this value is `None`, the values will be computed from `names_from`. * When this value is not `None`, each element’s length must match the length of `names_from`. See examples below for more detail.	`None`
values_from	str \| Iterable[str] \| s.Selector	An argument describing which column or columns to get the cell values from.	`'value'`
values_fill	int \| float \| str \| ir.Scalar \| None	A scalar value that specifies what each value should be filled with when missing.	`None`
values_agg	str \| Callable[[ir.Value], ir.Scalar] \| Deferred	A function applied to the value in each cell in the output.	`'arbitrary'`

Returns

Name	Type	Description
	Table	Wider pivoted table

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True

Basic usage

>>> fish_encounters = ibis.examples.fish_encounters.fetch()
>>> fish_encounters

┏━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ fish  ┃ station ┃ seen  ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ int64 │ string  │ int64 │
├───────┼─────────┼───────┤
│  4842 │ Release │     1 │
│  4842 │ I80_1   │     1 │
│  4842 │ Lisbon  │     1 │
│  4842 │ Rstr    │     1 │
│  4842 │ Base_TD │     1 │
│  4842 │ BCE     │     1 │
│  4842 │ BCW     │     1 │
│  4842 │ BCE2    │     1 │
│  4842 │ BCW2    │     1 │
│  4842 │ MAE     │     1 │
│     … │ …       │     … │
└───────┴─────────┴───────┘

>>> fish_encounters.pivot_wider(names_from="station", values_from="seen")

┏━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish  ┃ Release ┃ Rstr  ┃ BCW   ┃ BCE2  ┃ Lisbon ┃ Base_TD ┃ BCW2  ┃ BCE   ┃ MAE   ┃ I80_1 ┃ MAW   ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64   │ int64 │ int64 │ int64 │ int64  │ int64   │ int64 │ int64 │ int64 │ int64 │ int64 │
├───────┼─────────┼───────┼───────┼───────┼────────┼─────────┼───────┼───────┼───────┼───────┼───────┤
│  4844 │       1 │     1 │     1 │     1 │      1 │       1 │     1 │     1 │     1 │     1 │     1 │
│  4845 │       1 │     1 │  NULL │  NULL │      1 │       1 │  NULL │  NULL │  NULL │     1 │  NULL │
│  4849 │       1 │  NULL │  NULL │  NULL │   NULL │    NULL │  NULL │  NULL │  NULL │     1 │  NULL │
│  4859 │       1 │     1 │  NULL │  NULL │      1 │       1 │  NULL │  NULL │  NULL │     1 │  NULL │
│  4861 │       1 │     1 │     1 │     1 │      1 │       1 │     1 │     1 │     1 │     1 │     1 │
│  4842 │       1 │     1 │     1 │     1 │      1 │       1 │     1 │     1 │     1 │     1 │     1 │
│  4847 │       1 │  NULL │  NULL │  NULL │      1 │    NULL │  NULL │  NULL │  NULL │     1 │  NULL │
│  4850 │       1 │     1 │     1 │  NULL │   NULL │       1 │  NULL │     1 │  NULL │     1 │  NULL │
│  4854 │       1 │  NULL │  NULL │  NULL │   NULL │    NULL │  NULL │  NULL │  NULL │     1 │  NULL │
│  4855 │       1 │     1 │  NULL │  NULL │      1 │       1 │  NULL │  NULL │  NULL │     1 │  NULL │
│     … │       … │     … │     … │     … │      … │       … │     … │     … │     … │     … │     … │
└───────┴─────────┴───────┴───────┴───────┴────────┴─────────┴───────┴───────┴───────┴───────┴───────┘

You can do simple transpose-like operations using pivot_wider

>>> t = ibis.memtable(dict(outcome=["yes", "no"], counted=[3, 4]))
>>> t

┏━━━━━━━━━┳━━━━━━━━━┓
┃ outcome ┃ counted ┃
┡━━━━━━━━━╇━━━━━━━━━┩
│ string  │ int64   │
├─────────┼─────────┤
│ yes     │       3 │
│ no      │       4 │
└─────────┴─────────┘

>>> t.pivot_wider(names_from="outcome", values_from="counted", names_sort=True)

┏━━━━━━━┳━━━━━━━┓
┃ no    ┃ yes   ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│     4 │     3 │
└───────┴───────┘

Fill missing pivoted values using values_fill

>>> fish_encounters.pivot_wider(
...     names_from="station", values_from="seen", values_fill=0
... )

┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish  ┃ BCE   ┃ MAE   ┃ Lisbon ┃ Base_TD ┃ BCW2  ┃ I80_1 ┃ MAW   ┃ Release ┃ Rstr  ┃ BCW   ┃ BCE2  ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │ int64  │ int64   │ int64 │ int64 │ int64 │ int64   │ int64 │ int64 │ int64 │
├───────┼───────┼───────┼────────┼─────────┼───────┼───────┼───────┼─────────┼───────┼───────┼───────┤
│  4844 │     1 │     1 │      1 │       1 │     1 │     1 │     1 │       1 │     1 │     1 │     1 │
│  4845 │     0 │     0 │      1 │       1 │     0 │     1 │     0 │       1 │     1 │     0 │     0 │
│  4849 │     0 │     0 │      0 │       0 │     0 │     1 │     0 │       1 │     0 │     0 │     0 │
│  4859 │     0 │     0 │      1 │       1 │     0 │     1 │     0 │       1 │     1 │     0 │     0 │
│  4861 │     1 │     1 │      1 │       1 │     1 │     1 │     1 │       1 │     1 │     1 │     1 │
│  4843 │     1 │     1 │      1 │       1 │     1 │     1 │     1 │       1 │     1 │     1 │     1 │
│  4848 │     0 │     0 │      1 │       0 │     0 │     1 │     0 │       1 │     1 │     0 │     0 │
│  4865 │     0 │     0 │      1 │       0 │     0 │     1 │     0 │       1 │     0 │     0 │     0 │
│  4842 │     1 │     1 │      1 │       1 │     1 │     1 │     1 │       1 │     1 │     1 │     1 │
│  4847 │     0 │     0 │      1 │       0 │     0 │     1 │     0 │       1 │     0 │     0 │     0 │
│     … │     … │     … │      … │       … │     … │     … │     … │       … │     … │     … │     … │
└───────┴───────┴───────┴────────┴─────────┴───────┴───────┴───────┴─────────┴───────┴───────┴───────┘

Compute multiple values columns

>>> us_rent_income = ibis.examples.us_rent_income.fetch()
>>> us_rent_income

┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ geoid  ┃ name       ┃ variable ┃ estimate ┃ moe   ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ string │ string     │ string   │ int64    │ int64 │
├────────┼────────────┼──────────┼──────────┼───────┤
│ 01     │ Alabama    │ income   │    24476 │   136 │
│ 01     │ Alabama    │ rent     │      747 │     3 │
│ 02     │ Alaska     │ income   │    32940 │   508 │
│ 02     │ Alaska     │ rent     │     1200 │    13 │
│ 04     │ Arizona    │ income   │    27517 │   148 │
│ 04     │ Arizona    │ rent     │      972 │     4 │
│ 05     │ Arkansas   │ income   │    23789 │   165 │
│ 05     │ Arkansas   │ rent     │      709 │     5 │
│ 06     │ California │ income   │    29454 │   109 │
│ 06     │ California │ rent     │     1358 │     3 │
│ …      │ …          │ …        │        … │     … │
└────────┴────────────┴──────────┴──────────┴───────┘

>>> us_rent_income.pivot_wider(
...     names_from="variable", values_from=["estimate", "moe"]
... )

┏━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid  ┃ name        ┃ estimate_income ┃ moe_income ┃ estimate_rent ┃ moe_rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ string      │ int64           │ int64      │ int64         │ int64    │
├────────┼─────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 01     │ Alabama     │           24476 │        136 │           747 │        3 │
│ 08     │ Colorado    │           32401 │        109 │          1125 │        5 │
│ 10     │ Delaware    │           31560 │        247 │          1076 │       10 │
│ 20     │ Kansas      │           29126 │        208 │           801 │        5 │
│ 27     │ Minnesota   │           32734 │        189 │           906 │        4 │
│ 28     │ Mississippi │           22766 │        194 │           740 │        5 │
│ 34     │ New Jersey  │           35075 │        148 │          1249 │        4 │
│ 50     │ Vermont     │           29351 │        361 │           945 │       11 │
│ 51     │ Virginia    │           32545 │        202 │          1166 │        5 │
│ 53     │ Washington  │           32318 │        113 │          1120 │        4 │
│ …      │ …           │               … │          … │             … │        … │
└────────┴─────────────┴─────────────────┴────────────┴───────────────┴──────────┘

The column name separator can be changed using the names_sep parameter

>>> us_rent_income.pivot_wider(
...     names_from="variable",
...     names_sep=".",
...     values_from=("estimate", "moe"),
... )

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid  ┃ name                 ┃ estimate.income ┃ moe.income ┃ estimate.rent ┃ moe.rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ string               │ int64           │ int64      │ int64         │ int64    │
├────────┼──────────────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 06     │ California           │           29454 │        109 │          1358 │        3 │
│ 11     │ District of Columbia │           43198 │        681 │          1424 │       17 │
│ 17     │ Illinois             │           30684 │         83 │           952 │        3 │
│ 18     │ Indiana              │           27247 │        117 │           782 │        3 │
│ 25     │ Massachusetts        │           34498 │        199 │          1173 │        5 │
│ 33     │ New Hampshire        │           33172 │        387 │          1052 │        9 │
│ 39     │ Ohio                 │           27435 │         94 │           764 │        2 │
│ 55     │ Wisconsin            │           29868 │        135 │           813 │        3 │
│ 05     │ Arkansas             │           23789 │        165 │           709 │        5 │
│ 12     │ Florida              │           25952 │         70 │          1077 │        3 │
│ …      │ …                    │               … │          … │             … │        … │
└────────┴──────────────────────┴─────────────────┴────────────┴───────────────┴──────────┘

Supply an alternative function to summarize values

>>> warpbreaks = ibis.examples.warpbreaks.fetch().select("wool", "tension", "breaks")
>>> warpbreaks

┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ wool   ┃ tension ┃ breaks ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ string │ string  │ int64  │
├────────┼─────────┼────────┤
│ A      │ L       │     26 │
│ A      │ L       │     30 │
│ A      │ L       │     54 │
│ A      │ L       │     25 │
│ A      │ L       │     70 │
│ A      │ L       │     52 │
│ A      │ L       │     51 │
│ A      │ L       │     26 │
│ A      │ L       │     67 │
│ A      │ M       │     18 │
│ …      │ …       │      … │
└────────┴─────────┴────────┘

>>> warpbreaks.pivot_wider(
...     names_from="wool", values_from="breaks", values_agg="mean"
... ).select("tension", "A", "B").order_by("tension")

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ tension ┃ A         ┃ B         ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string  │ float64   │ float64   │
├─────────┼───────────┼───────────┤
│ H       │ 24.555556 │ 18.777778 │
│ L       │ 44.555556 │ 28.222222 │
│ M       │ 24.000000 │ 28.777778 │
└─────────┴───────────┴───────────┘

Passing Deferred objects to values_agg is supported

>>> warpbreaks.pivot_wider(
...     names_from="tension",
...     values_from="breaks",
...     values_agg=_.sum(),
... ).select("wool", "H", "L", "M").order_by(s.all())

┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ wool   ┃ H     ┃ L     ┃ M     ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ int64 │ int64 │ int64 │
├────────┼───────┼───────┼───────┤
│ A      │   221 │   401 │   216 │
│ B      │   169 │   254 │   259 │
└────────┴───────┴───────┴───────┘

Use a custom aggregate function

>>> warpbreaks.pivot_wider(
...     names_from="wool",
...     values_from="breaks",
...     values_agg=lambda col: col.std() / col.mean(),
... ).select("tension", "A", "B").order_by("tension")

┏━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ tension ┃ A        ┃ B        ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ string  │ float64  │ float64  │
├─────────┼──────────┼──────────┤
│ H       │ 0.418344 │ 0.260590 │
│ L       │ 0.406183 │ 0.349325 │
│ M       │ 0.360844 │ 0.327719 │
└─────────┴──────────┴──────────┘

Generate some random data, setting the random seed for reproducibility

>>> import random
>>> random.seed(0)
>>> raw = ibis.memtable(
...     [
...         dict(
...             product=product,
...             country=country,
...             year=year,
...             production=random.random(),
...         )
...         for product in "AB"
...         for country in ["AI", "EI"]
...         for year in range(2000, 2015)
...     ]
... )
>>> production = raw.filter(((_.product == "A") & (_.country == "AI")) | (_.product == "B"))
>>> production.order_by(s.all())

┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ product ┃ country ┃ year  ┃ production ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ string  │ string  │ int64 │ float64    │
├─────────┼─────────┼───────┼────────────┤
│ A       │ AI      │  2000 │   0.844422 │
│ A       │ AI      │  2001 │   0.757954 │
│ A       │ AI      │  2002 │   0.420572 │
│ A       │ AI      │  2003 │   0.258917 │
│ A       │ AI      │  2004 │   0.511275 │
│ A       │ AI      │  2005 │   0.404934 │
│ A       │ AI      │  2006 │   0.783799 │
│ A       │ AI      │  2007 │   0.303313 │
│ A       │ AI      │  2008 │   0.476597 │
│ A       │ AI      │  2009 │   0.583382 │
│ …       │ …       │     … │          … │
└─────────┴─────────┴───────┴────────────┘

Pivoting with multiple name columns

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     values_from="production",
... )

┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year  ┃ A_AI     ┃ B_EI     ┃ B_AI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64  │ float64  │ float64  │
├───────┼──────────┼──────────┼──────────┤
│  2001 │ 0.757954 │ 0.191067 │ 0.865310 │
│  2003 │ 0.258917 │ 0.238616 │ 0.805028 │
│  2009 │ 0.583382 │ 0.507941 │ 0.668153 │
│  2012 │ 0.281838 │ 0.551267 │ 0.867603 │
│  2004 │ 0.511275 │ 0.967540 │ 0.548699 │
│  2006 │ 0.783799 │ 0.447970 │ 0.719705 │
│  2007 │ 0.303313 │ 0.080446 │ 0.398824 │
│  2008 │ 0.476597 │ 0.320055 │ 0.824845 │
│  2011 │ 0.504687 │ 0.109058 │ 0.493578 │
│  2000 │ 0.844422 │ 0.870471 │ 0.477010 │
│     … │        … │        … │        … │
└───────┴──────────┴──────────┴──────────┘

Select a subset of names. This call incurs no computation when constructing the expression.

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     names=[("A", "AI"), ("B", "AI")],
...     values_from="production",
... )

┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year  ┃ A_AI     ┃ B_AI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64  │ float64  │
├───────┼──────────┼──────────┤
│  2001 │ 0.757954 │ 0.865310 │
│  2003 │ 0.258917 │ 0.805028 │
│  2009 │ 0.583382 │ 0.668153 │
│  2012 │ 0.281838 │ 0.867603 │
│  2004 │ 0.511275 │ 0.548699 │
│  2006 │ 0.783799 │ 0.719705 │
│  2007 │ 0.303313 │ 0.398824 │
│  2008 │ 0.476597 │ 0.824845 │
│  2011 │ 0.504687 │ 0.493578 │
│  2000 │ 0.844422 │ 0.477010 │
│     … │        … │        … │
└───────┴──────────┴──────────┘

Sort the new columns’ names

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     values_from="production",
...     names_sort=True,
... )

┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year  ┃ A_AI     ┃ B_AI     ┃ B_EI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64  │ float64  │ float64  │
├───────┼──────────┼──────────┼──────────┤
│  2001 │ 0.757954 │ 0.865310 │ 0.191067 │
│  2003 │ 0.258917 │ 0.805028 │ 0.238616 │
│  2009 │ 0.583382 │ 0.668153 │ 0.507941 │
│  2012 │ 0.281838 │ 0.867603 │ 0.551267 │
│  2004 │ 0.511275 │ 0.548699 │ 0.967540 │
│  2006 │ 0.783799 │ 0.719705 │ 0.447970 │
│  2007 │ 0.303313 │ 0.398824 │ 0.080446 │
│  2008 │ 0.476597 │ 0.824845 │ 0.320055 │
│  2011 │ 0.504687 │ 0.493578 │ 0.109058 │
│  2000 │ 0.844422 │ 0.477010 │ 0.870471 │
│     … │        … │        … │        … │
└───────┴──────────┴──────────┴──────────┘

preview

preview(
    max_rows=None,
    max_columns=None,
    max_length=None,
    max_string=None,
    max_depth=None,
    console_width=None,
)

Return a subset as a Rich Table.

This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive.

Parameters

Name	Type	Description	Default
max_rows	int \| None	Maximum number of rows to display	`None`
max_columns	int \| None	Maximum number of columns to display	`None`
max_length	int \| None	Maximum length for pretty-printed arrays and maps	`None`
max_string	int \| None	Maximum length for pretty-printed strings	`None`
max_depth	int \| None	Maximum depth for nested data types	`None`
console_width	int \| float \| None	Width of the console in characters. If not specified, the width will be inferred from the console.	`None`

Examples

>>> import xorq.api as xo
>>> t = xo.examples.penguins.fetch(deferred=False)

Because the console_width is too small, only 2 columns are shown even though we specified up to 3.

>>> t.preview(
...     max_rows=3,
...     max_columns=3,
...     max_string=8,
...     console_width=30,
... )

┏━━━━━━━━━┳━━━━━━━━━━┳━━━┓
┃ species ┃ island   ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━┩
│ string  │ string   │ … │
├─────────┼──────────┼───┤
│ Adelie  │ Torgers… │ … │
│ Adelie  │ Torgers… │ … │
│ Adelie  │ Torgers… │ … │
│ …       │ …        │ … │
└─────────┴──────────┴───┘

relabel

relabel(substitutions)

Deprecated in favor of Table.rename.

relocate

relocate(*columns, before=None, after=None, **kwargs)

Relocate columns before or after other specified columns.

Parameters

Name	Type	Description	Default
columns	str \| s.Selector	Columns to relocate. Selectors are accepted.	`()`
before	str \| s.Selector \| None	A column name or selector to insert the new columns before.	`None`
after	str \| s.Selector \| None	A column name or selector. Columns in `columns` are relocated after the last column selected in `after`.	`None`
kwargs	str	Additional column names to relocate, renaming argument values to keyword argument names.	`{}`

Returns

Name	Type	Description
	Table	A table with the columns relocated.

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> import xorq.expr.selectors as s
>>> t = xo.memtable(dict(a=[1], b=[1], c=[1], d=["a"], e=["a"], f=["a"]))
>>> t

┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b     ┃ c     ┃ d      ┃ e      ┃ f      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ string │ string │ string │
├───────┼───────┼───────┼────────┼────────┼────────┤
│     1 │     1 │     1 │ a      │ a      │ a      │
└───────┴───────┴───────┴────────┴────────┴────────┘

>>> t.relocate("f")

┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ f      ┃ a     ┃ b     ┃ c     ┃ d      ┃ e      ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ string │ string │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a      │     1 │     1 │     1 │ a      │ a      │
└────────┴───────┴───────┴───────┴────────┴────────┘

>>> t.relocate("a", after="c")

┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ b     ┃ c     ┃ a     ┃ d      ┃ e      ┃ f      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ string │ string │ string │
├───────┼───────┼───────┼────────┼────────┼────────┤
│     1 │     1 │     1 │ a      │ a      │ a      │
└───────┴───────┴───────┴────────┴────────┴────────┘

>>> t.relocate("f", before="b")

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a     ┃ f      ┃ b     ┃ c     ┃ d      ┃ e      ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │ string │ string │
├───────┼────────┼───────┼───────┼────────┼────────┤
│     1 │ a      │     1 │     1 │ a      │ a      │
└───────┴────────┴───────┴───────┴────────┴────────┘

>>> t.relocate("a", after=s.last())

┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ b     ┃ c     ┃ d      ┃ e      ┃ f      ┃ a     ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ string │ string │ string │ int64 │
├───────┼───────┼────────┼────────┼────────┼───────┤
│     1 │     1 │ a      │ a      │ a      │     1 │
└───────┴───────┴────────┴────────┴────────┴───────┘

Relocate allows renaming

>>> t.relocate(ff="f")

┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ ff     ┃ a     ┃ b     ┃ c     ┃ d      ┃ e      ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ string │ string │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a      │     1 │     1 │     1 │ a      │ a      │
└────────┴───────┴───────┴───────┴────────┴────────┘

You can relocate based on any predicate selector, such as of_type

>>> t.relocate(s.of_type("string"))

┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d      ┃ e      ┃ f      ┃ a     ┃ b     ┃ c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a      │ a      │ a      │     1 │     1 │     1 │
└────────┴────────┴────────┴───────┴───────┴───────┘

>>> t.relocate(s.numeric(), after=s.last())

┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d      ┃ e      ┃ f      ┃ a     ┃ b     ┃ c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a      │ a      │ a      │     1 │     1 │     1 │
└────────┴────────┴────────┴───────┴───────┴───────┘

When multiple columns are selected with before or after, those selected columns are moved before and after the selectors input

>>> t = xo.memtable(dict(a=[1], b=["a"], c=[1], d=["a"]))
>>> t.relocate(s.numeric(), after=s.of_type("string"))

┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b      ┃ d      ┃ a     ┃ c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │ int64 │
├────────┼────────┼───────┼───────┤
│ a      │ a      │     1 │     1 │
└────────┴────────┴───────┴───────┘

>>> t.relocate(s.numeric(), before=s.of_type("string"))

┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a     ┃ c     ┃ b      ┃ d      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ string │ string │
├───────┼───────┼────────┼────────┤
│     1 │     1 │ a      │ a      │
└───────┴───────┴────────┴────────┘

When there are duplicate renames in a call to relocate, the last one is preserved

>>> t.relocate(e="d", f="d")

┏━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ f      ┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ int64 │ string │ int64 │
├────────┼───────┼────────┼───────┤
│ a      │     1 │ a      │     1 │
└────────┴───────┴────────┴───────┘

However, if there are duplicates that are not part of a rename, the order specified in the relocate call is preserved

>>> t.relocate(
...     "b",
...     s.of_type("string"),  # "b" is a string column, so the selector matches
... )

┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b      ┃ d      ┃ a     ┃ c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │ int64 │
├────────┼────────┼───────┼───────┤
│ a      │ a      │     1 │     1 │
└────────┴────────┴───────┴───────┘

rename

rename(method=None, /, **substitutions)

Rename columns in the table.

Parameters

Name	Type	Description	Default
method	str \| Callable[[str], str \| None] \| Literal['snake_case', 'ALL_CAPS'] \| Mapping[str, str] \| None	An optional method for renaming columns. May be one of: - A format string to use to rename all columns, like `"prefix_{name}"`. - A function from old name to new name. If the function returns `None` the old name is used. - The literal strings `"snake_case"` or `"ALL_CAPS"` to rename all columns using a `snake_case` or "ALL_CAPS"`` naming convention respectively. - A mapping from new name to old name. Existing columns not present in the mapping will passthrough with their original name. \|None`\| \| substitutions \| [str](`str`) \| Columns to be explicitly renamed, expressed as`new_name=old_name``keyword arguments. \|``

Returns

Name	Type	Description
	Table	A renamed table expression

rowid

rowid()

A unique integer per row.

This operation is only valid on physical tables

Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, SQLite and DuckDB’s rowid).

For a monotonically increasing row number, see ibis.row_number.

Returns

Name	Type	Description
	IntegerColumn	An integer column

sample

sample(fraction, *, method='row', seed=None)

Sample a fraction of rows from a table.

Results may be non-repeatable

Sampling is by definition a random operation. Some backends support specifying a seed for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.

In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed set.

Parameters

Name	Type	Description	Default
fraction	float	The percentage of rows to include in the sample, expressed as a float between 0 and 1.	required
method	Literal['row', 'block']	The sampling method to use. The default is “row”, which includes each row with a probability of `fraction`. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause.	`'row'`
seed	int \| None	An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least `[0, 2**31 - 1]`). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases.	`None`

Returns

Name	Type	Description
	Table	The input table, with `fraction` of rows selected.

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"x": [1, 2, 3, 4], "y": ["a", "b", "c", "d"]})
>>> t

┏━━━━━━━┳━━━━━━━━┓
┃ x     ┃ y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ a      │
│     2 │ b      │
│     3 │ c      │
│     4 │ d      │
└───────┴────────┘

Sample approximately half the rows, with a seed specified for reproducibility.

>>> t.sample(0.5, seed=1234)

Translation to backend failed
Error message: UnsupportedOperationError('`Table.sample` with a random seed is unsupported')
Expression repr follows:
r0 := InMemoryTable
  data:
    PandasDataFrameProxy:
         x  y
      0  1  a
      1  2  b
      2  3  c
      3  4  d

Sample[r0, fraction=0.5, method='row', seed=1234]

schema

schema()

Return the Schema for this table.

Returns

Name	Type	Description
	Schema	The table’s schema.

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.schema()

ibis.Schema {
  species            string
  island             string
  bill_length_mm     float64
  bill_depth_mm      float64
  flipper_length_mm  float64
  body_mass_g        float64
  sex                string
  year               int64
}

select

select(*exprs, **named_exprs)

Compute a new table expression using exprs and named_exprs.

Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters

Name	Type	Description	Default
exprs	ir.Value \| str \| Iterable[ir.Value \| str]	Column expression, string, or list of column expressions and strings.	`()`
named_exprs	ir.Value \| str	Column expressions	`{}`

Returns

Name	Type	Description
	Table	Table expression

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Simple projection

>>> t.select("island", "bill_length_mm").head()

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island    ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string    │ float64        │
├───────────┼────────────────┤
│ Torgersen │           39.1 │
│ Torgersen │           39.5 │
│ Torgersen │           40.3 │
│ Torgersen │           NULL │
│ Torgersen │           36.7 │
└───────────┴────────────────┘

In that simple case, you could also just use python’s indexing syntax

>>> t[["island", "bill_length_mm"]].head()

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island    ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string    │ float64        │
├───────────┼────────────────┤
│ Torgersen │           39.1 │
│ Torgersen │           39.5 │
│ Torgersen │           40.3 │
│ Torgersen │           NULL │
│ Torgersen │           36.7 │
└───────────┴────────────────┘

Projection by zero-indexed column position

>>> t.select(t[0], t[4]).head()

┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string  │ float64           │
├─────────┼───────────────────┤
│ Adelie  │             181.0 │
│ Adelie  │             186.0 │
│ Adelie  │             195.0 │
│ Adelie  │              NULL │
│ Adelie  │             193.0 │
└─────────┴───────────────────┘

Projection with renaming and compute in one call

>>> t.select(next_year=t.year + 1).head()

┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

You can do the same thing with a named expression, and using the deferred API

>>> from xorq.api import _
>>> t.select((_.year + 1).name("next_year")).head()

┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

Projection with aggregation expressions

>>> t.select("island", bill_mean=t.bill_length_mm.mean()).head()

┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island    ┃ bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string    │ float64   │
├───────────┼───────────┤
│ Torgersen │  43.92193 │
│ Torgersen │  43.92193 │
│ Torgersen │  43.92193 │
│ Torgersen │  43.92193 │
│ Torgersen │  43.92193 │
└───────────┴───────────┘

Projection with a selector

>>> import xorq.expr.selectors as s
>>> t.select(s.numeric() & ~s.cols("year")).head()

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64        │ float64       │ float64           │ float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│           39.1 │          18.7 │             181.0 │      3750.0 │
│           39.5 │          17.4 │             186.0 │      3800.0 │
│           40.3 │          18.0 │             195.0 │      3250.0 │
│           NULL │          NULL │              NULL │        NULL │
│           36.7 │          19.3 │             193.0 │      3450.0 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

Projection + aggregation across multiple columns

>>> from xorq.api import _
>>> t.select(s.across(s.numeric() & ~s.cols("year"), _.mean())).head()

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64        │ float64       │ float64           │ float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

sql

sql(query, dialect=None)

Run a SQL query against a table expression.

Parameters

Name	Type	Description	Default
query	str	Query string	required
dialect	str \| None	Optional string indicating the dialect of `query`. Defaults to the backend’s native dialect.	`None`

Returns

Name	Type	Description
	Table	An opaque table expression

Examples

>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(table_name="penguins", deferred=False)
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island    ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string    │ float64         │
├───────────┼─────────────────┤
│ Biscoe    │       45.257485 │
│ Dream     │       44.167742 │
│ Torgersen │       38.950980 │
└───────────┴─────────────────┘

Mix and match ibis expressions with SQL queries

>>> t = xo.examples.penguins.fetch(table_name="penguins", deferred=False)
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr = expr.mutate(
...     island=_.island.lower(),
...     avg_bill_length=_.avg_bill_length.round(1),
... )
>>> expr

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island    ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string    │ float64         │
├───────────┼─────────────────┤
│ torgersen │            39.0 │
│ biscoe    │            45.3 │
│ dream     │            44.2 │
└───────────┴─────────────────┘

Because ibis expressions aren’t named, they aren’t visible to subsequent .sql calls. Use the alias method to assign a name to an expression.

>>> expr.alias("b").sql("SELECT * FROM b WHERE avg_bill_length > 40")

to_array

to_array()

View a single column table as an array.

Returns

Name	Type	Description
	Value	A single column view of a table

to_csv

to_csv(path, *, params=None, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
path	str \| Path	The data source. A string or Path to the CSV file.	required
params	Mapping[ir.Scalar, Any] \| None	Mapping of scalar parameter expressions to value.	`None`
**kwargs	Any	Additional keyword arguments passed to pyarrow.csv.CSVWriter	`{}`
https			required

to_json

to_json(path, *, params=None, **kwargs)

Write the results of expr to a NDJSON file.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
path	str \| Path	The data source. A string or Path to the Delta Lake table.	required
**kwargs	Any	Additional, backend-specific keyword arguments.	`{}`
https			required

to_pandas

to_pandas(**kwargs)

Convert a table expression to a pandas DataFrame.

Parameters

Name	Type	Description	Default
kwargs		Same as keyword arguments to `execute`	`{}`

to_parquet

to_parquet(path, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html for details.

Parameters

Name	Type	Description	Default
path	str \| Path	A string or Path where the Parquet file will be written.	required
params	Mapping[ir.Scalar, Any] \| None	Mapping of scalar parameter expressions to value.	`None`
**kwargs	Any	Additional keyword arguments passed to pyarrow.parquet.ParquetWriter	`{}`

Examples

Write out an expression to a single parquet file.

>>> import ibis
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> penguins.to_parquet(tempfile.mktemp())

to_pyarrow

to_pyarrow(**kwargs)

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
kwargs	Any	Keyword arguments	`{}`

Returns

Name	Type	Description
	Table	A pyarrow table holding the results of the executed expression.

to_pyarrow_batches

to_pyarrow_batches(chunk_size=1000000, **kwargs)

Execute expression and return a RecordBatchReader.

This method is eager and will execute the associated expression immediately.

Parameters

Name	Type	Description	Default
chunk_size	int	Maximum number of rows in each returned record batch.	`1000000`
kwargs	Any	Keyword arguments	`{}`

Returns

Name	Type	Description
	results	RecordBatchReader

try_cast

try_cast(schema)

Cast the columns of a table.

If the cast fails for a row, the value is returned as NULL or NaN depending on backend behavior.

Parameters

Name	Type	Description	Default
schema	SchemaLike	Mapping, schema or iterable of pairs to use for casting	required

Returns

Name	Type	Description
	Table	Cast table

Examples

unbind

unbind()

Return an expression built on UnboundTable instead of backend-specific objects.

union

union(table, *rest, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name	Type	Description	Default
table	Table	A table expression	required
*rest	Table	Additional table expressions	`()`
distinct	bool	Only return distinct rows	`False`

Returns

Name	Type	Description
	Table	A new table containing the union of all input tables.

Examples

>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a": [1, 2]})
>>> t1

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘

>>> t2 = xo.memtable({"a": [2, 3]})
>>> t2

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘

>>> t1.union(t2)  # union all by default doctest: +SKIP

>>> t1.union(t2, distinct=True).order_by("a")

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘

unnest

unnest(column, offset=None, keep_empty=False)

Unnest an array column from a table.

When unnesting an existing column the newly unnested column replaces the existing column.

Parameters

Name	Type	Description	Default
column		Array column to unnest.	required
offset	str \| None	Name of the resulting index column.	`None`
keep_empty	bool	Keep empty array values as `NULL` in the output table, as well as existing `NULL` values.	`False`

Returns

Name	Type	Description
	Table	Table with the array column `column` unnested.

Examples

unpack

unpack(*columns)

Project the struct fields of each of columns into self.

Existing fields are retained in the projection.

Parameters

Name	Type	Description	Default
columns	str	String column names to project into `self`.	`()`

Returns

Name	Type	Description
	Table	The child table with struct fields of each of `columns` projected.

value_counts

value_counts()

Compute a frequency table of this table’s values.

Returns

Name	Type	Description
	Table	Frequency table of this table’s values.

Examples

>>> import xorq.api as xo
>>> from xorq import examples
>>> xo.options.interactive = True
>>> t = examples.penguins.fetch()
>>> t.head()

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

>>> t.year.value_counts().order_by("year")

┏━━━━━━━┳━━━━━━━━━━━━┓
┃ year  ┃ year_count ┃
┡━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ int64      │
├───────┼────────────┤
│  2007 │        110 │
│  2008 │        114 │
│  2009 │        120 │
└───────┴────────────┘

>>> t[["year", "island"]].value_counts().order_by("year", "island")

┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year  ┃ island    ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string    │ int64             │
├───────┼───────────┼───────────────────┤
│  2007 │ Biscoe    │                44 │
│  2007 │ Dream     │                46 │
│  2007 │ Torgersen │                20 │
│  2008 │ Biscoe    │                64 │
│  2008 │ Dream     │                34 │
│  2008 │ Torgersen │                16 │
│  2009 │ Biscoe    │                60 │
│  2009 │ Dream     │                44 │
│  2009 │ Torgersen │                16 │
└───────┴───────────┴───────────────────┘

view

view()

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns

Name	Type	Description
	Table	Table expression

visualize

visualize(
    format='svg',
    *,
    label_edges=False,
    verbose=False,
    node_attr=None,
    node_attr_getter=None,
    edge_attr=None,
    edge_attr_getter=None,
)

Visualize an expression as a GraphViz graph in the browser.

Parameters

Name	Type	Description	Default
format	str	Image output format. These are specified by the `graphviz` Python library.	`'svg'`
label_edges	bool	Show operation input names as edge labels	`False`
verbose	bool	Print the graphviz DOT code to stderr if	`False`
node_attr	Mapping[str, str] \| None	Mapping of `(attribute, value)` pairs set for all nodes. Options are specified by the `graphviz` Python library.	`None`
node_attr_getter	NodeAttributeGetter \| None	Callback taking a node and returning a mapping of `(attribute, value)` pairs for that node. Options are specified by the `graphviz` Python library.	`None`
edge_attr	Mapping[str, str] \| None	Mapping of `(attribute, value)` pairs set for all edges. Options are specified by the `graphviz` Python library.	`None`
edge_attr_getter	EdgeAttributeGetter \| None	Callback taking two adjacent nodes and returning a mapping of `(attribute, value)` pairs for the edge between those nodes. Options are specified by the `graphviz` Python library.	`None`

Examples

Open the visualization of an expression in default browser:

>>> import xorq.api as xo
>>> import xorq.vendor.ibis.expr.operations as ops
>>> left = ibis.table(dict(a="int64", b="string"), name="left")
>>> right = ibis.table(dict(b="string", c="int64", d="string"), name="right")
>>> expr = left.inner_join(right, "b").select(left.a, b=right.c, c=right.d)
>>> expr.visualize(
...     format="svg",
...     label_edges=True,
...     node_attr={"fontname": "Roboto Mono", "fontsize": "10"},
...     node_attr_getter=lambda node: isinstance(node, ops.Field) and {"shape": "oval"},
...     edge_attr={"fontsize": "8"},
...     edge_attr_getter=lambda u, v: isinstance(u, ops.Field) and {"color": "red"},
... )  # quartodoc: +SKIP

Raises

Name	Type	Description
	ImportError	If `graphviz` is not installed.

Attributes

Methods

aggregate

Parameters

Returns

Examples

alias

Parameters

Returns

Examples

as_scalar

Returns

Examples

as_table

Returns

Examples

asof_join

Parameters

Returns

bind

Parameters

Returns

cache

Parameters

Returns

Notes

Examples

See Also

Notes

cast

Parameters

Returns

Examples

compile

Parameters

count

Parameters

Returns

Examples

cross_join

Parameters

Returns

Examples

describe

Parameters

Returns

Notes

Examples

difference

Parameters

See Also

Returns

Examples

distinct

Parameters

Examples

drop

Parameters

Returns

Examples

drop_null

Parameters

Returns

Examples

dropna

equals

Parameters

Examples

execute

Parameters

Examples

fill_null

Parameters

Returns

Examples

fillna

filter

Parameters

Returns

Examples