Column

Column(arg)

Methods

Name	Description
approx_median	Return an approximate of the median of `self`.
approx_nunique	Return the approximate number of distinct elements in `self`.
arbitrary	Select an arbitrary value in a column.
argmax	Return the value of `self` that maximizes `key`.
argmin	Return the value of `self` that minimizes `key`.
as_scalar	Inform ibis that the expression should be treated as a scalar.
as_table	Promote the expression to a Table.
count	Compute the number of rows in an expression.
cume_dist	Return the cumulative distribution over a window.
cummax	Return the cumulative max over a window.
cummin	Return the cumulative min over a window.
dense_rank	Position of first element within each group of equal values.
first	Return the first value of a column.
lag	Return the row located at `offset` rows before the current row.
last	Return the last value of a column.
lead	Return the row located at `offset` rows after the current row.
max	Return the maximum of a column.
median	Return the median of the column.
min	Return the minimum of a column.
mode	Return the mode of a column.
nth	Return the `n`th value (0-indexed) over a window.
ntile	Return the integer number of a partitioning of the column values.
nunique	Compute the number of distinct rows in an expression.
percent_rank	Return the relative rank of the values in the column.
preview	Print a subset as a single-column Rich Table.
quantile	Return value at the given quantile.
rank	Compute position of first element within each equal-value group in sorted order.
topk	Return a “top k” expression.
value_counts	Compute a frequency table.

approx_median

approx_median(where=None)

Return an approximate of the median of self.

The result may or may not be exact

Whether the result is an approximation depends on the backend.

Do not depend on the results being exact

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Filter in values when `where` is `True`	`None`

Returns

Name	Type	Description
	Scalar	An approximation of the median of `self`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.approx_median()

┌────────┐
│ 4030.0 │
└────────┘

>>> t.body_mass_g.approx_median(where=t.species == "Chinstrap")

┌────────┐
│ 3700.0 │
└────────┘

approx_nunique

approx_nunique(where=None)

Return the approximate number of distinct elements in self.

The result may or may not be exact

Whether the result is an approximation depends on the backend.

Do not depend on the results being exact

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Filter in values when `where` is `True`	`None`

Returns

Name	Type	Description
	Scalar	An approximate count of the distinct elements of `self`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.approx_nunique()

┌────┐
│ 92 │
└────┘

>>> t.body_mass_g.approx_nunique(where=t.species == "Adelie")

┌────┐
│ 61 │
└────┘

arbitrary

arbitrary(where=None, how=None)

Select an arbitrary value in a column.

Returns an arbitrary (nondeterministic, backend-specific) value from the column. The value will be non-NULL, except if the column is empty or all values are NULL.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	A filter expression	`None`
how	Any	DEPRECATED	`None`

Returns

Name	Type	Description
	Scalar	An expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 2, 2], "b": list("aaa"), "c": [4.0, 4.1, 4.2]})
>>> t

┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ a     ┃ b      ┃ c       ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ int64 │ string │ float64 │
├───────┼────────┼─────────┤
│     1 │ a      │     4.0 │
│     2 │ a      │     4.1 │
│     2 │ a      │     4.2 │
└───────┴────────┴─────────┘

>>> t.group_by("a").agg(arb=t.b.arbitrary(), c=t.c.sum()).order_by("a")

┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ a     ┃ arb    ┃ c       ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ int64 │ string │ float64 │
├───────┼────────┼─────────┤
│     1 │ a      │     4.0 │
│     2 │ a      │     8.3 │
└───────┴────────┴─────────┘

argmax

argmax(key, where=None)

Return the value of self that maximizes key.

Parameters

Name	Type	Description	Default
key	ir.Value	Key to use for `max` computation.	required
where	ir.BooleanValue \| None	Keep values when `where` is `True`	`None`

Returns

Name	Type	Description
	Scalar	The value of `self` that maximizes `key`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.species.argmax(t.body_mass_g)

┌────────┐
│ Gentoo │
└────────┘

>>> t.species.argmax(t.body_mass_g, where=t.island == "Dream")

┌───────────┐
│ Chinstrap │
└───────────┘

argmin

argmin(key, where=None)

Return the value of self that minimizes key.

Parameters

Name	Type	Description	Default
key	ir.Value	Key to use for `min` computation.	required
where	ir.BooleanValue \| None	Keep values when `where` is `True`	`None`

Returns

Name	Type	Description
	Scalar	The value of `self` that minimizes `key`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.species.argmin(t.body_mass_g)

┌───────────┐
│ Chinstrap │
└───────────┘

>>> t.species.argmin(t.body_mass_g, where=t.island == "Biscoe")

┌────────┐
│ Adelie │
└────────┘

as_scalar

as_scalar()

Inform ibis that the expression should be treated as a scalar.

Creates a scalar subquery from the column expression. Since ibis cannot be sure that the column expression contains only one value, the column expression is wrapped in a scalar subquery and treated as a scalar.

Note that the execution of the scalar subquery will fail if the column expression contains more than one value.

Returns

Name	Type	Description
	Scalar	A scalar subquery

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.island.as_scalar())
>>> from_that_island.species.value_counts().order_by("species")

┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string  │ int64         │
├─────────┼───────────────┤
│ Adelie  │            44 │
│ Gentoo  │           124 │
└─────────┴───────────────┘

as_table

as_table()

Promote the expression to a Table.

Returns

Name	Type	Description
	Table	A table expression

Examples

>>> import ibis
>>> t = ibis.table(dict(a="str"), name="t")
>>> expr = t.a.length().name("len").as_table()
>>> expected = t.select(len=t.a.length())
>>> expr.equals(expected)

True

count

count(where=None)

Compute the number of rows in an expression.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Filter expression	`None`

Returns

Name	Type	Description
	IntegerScalar	Number of elements in an expression

cume_dist

cume_dist()

Return the cumulative distribution over a window.

cummax

cummax(where=None, group_by=None, order_by=None)

Return the cumulative max over a window.

cummin

cummin(where=None, group_by=None, order_by=None)

Return the cumulative min over a window.

dense_rank

dense_rank()

Position of first element within each group of equal values.

Values are returned in sorted order and duplicate values are ignored.

Equivalent to SQL’s DENSE_RANK().

Returns

Name	Type	Description
	IntegerColumn	The rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.dense_rank())

┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ rank  ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64  │ int64 │
├────────┼───────┤
│      1 │     0 │
│      1 │     0 │
│      2 │     1 │
│      2 │     1 │
│      2 │     1 │
│      3 │     2 │
└────────┴───────┘

first

first(where=None, order_by=None, include_null=False)

Return the first value of a column.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	An optional filter expression. If provided, only rows where `where` is `True` will be included in the aggregate.	`None`
order_by	Any	An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of `first` is undefined and will be backend specific.	`None`
include_null	bool	Whether to include null values when performing this aggregation. Set to `True` to include nulls in the result.	`False`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "b", "c", "d"]})
>>> t

┏━━━━━━━━┓
┃ chars  ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ a      │
│ b      │
│ c      │
│ d      │
└────────┘

>>> t.chars.first()

┌───┐
│ a │
└───┘

>>> t.chars.first(where=t.chars != "a")

┌───┐
│ b │
└───┘

lag

lag(offset=None, default=None)

Return the row located at offset rows before the current row.

Parameters

Name	Type	Description	Default
offset	int \| ir.IntegerValue \| None	Index of row to select	`None`
default	Value \| None	Value used if no row exists at `offset`	`None`

last

last(where=None, order_by=None, include_null=False)

Return the last value of a column.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	An optional filter expression. If provided, only rows where `where` is `True` will be included in the aggregate.	`None`
order_by	Any	An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of `last` is undefined and will be backend specific.	`None`
include_null	bool	Whether to include null values when performing this aggregation. Set to `True` to include nulls in the result.	`False`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "b", "c", "d"]})
>>> t

┏━━━━━━━━┓
┃ chars  ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ a      │
│ b      │
│ c      │
│ d      │
└────────┘

>>> t.chars.last()

┌───┐
│ d │
└───┘

>>> t.chars.last(where=t.chars != "d")

┌───┐
│ c │
└───┘

lead

lead(offset=None, default=None)

Return the row located at offset rows after the current row.

Parameters

Name	Type	Description	Default
offset	int \| ir.IntegerValue \| None	Index of row to select	`None`
default	Value \| None	Value used if no row exists at `offset`	`None`

max

max(where=None)

Return the maximum of a column.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Filter in values when `where` is `True`	`None`

Returns

Name	Type	Description
	Scalar	The maximum value in `self`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.max()

┌──────┐
│ 6300 │
└──────┘

>>> t.body_mass_g.max(where=t.species == "Chinstrap")

┌──────┐
│ 4800 │
└──────┘

median

median(where=None)

Return the median of the column.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Optional boolean expression. If given, only the values where `where` evaluates to true will be considered for the median.	`None`

Returns

Name	Type	Description
	Scalar	Median of the column

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()

Compute the median of bill_depth_mm

>>> t.bill_depth_mm.median()

┌──────┐
│ 17.3 │
└──────┘

>>> t.group_by(t.species).agg(median_bill_depth=t.bill_depth_mm.median()).order_by(
...     ibis.desc("median_bill_depth")
... )

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species   ┃ median_bill_depth ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string    │ float64           │
├───────────┼───────────────────┤
│ Chinstrap │             18.45 │
│ Adelie    │             18.40 │
│ Gentoo    │             15.00 │
└───────────┴───────────────────┘

In addition to numeric types, any orderable non-numeric types such as strings and dates work with median.

>>> t.group_by(t.island).agg(median_species=t.species.median()).order_by(
...     ibis.desc("median_species")
... )

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island    ┃ median_species ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string    │ string         │
├───────────┼────────────────┤
│ Biscoe    │ Gentoo         │
│ Dream     │ Chinstrap      │
│ Torgersen │ Adelie         │
└───────────┴────────────────┘

min

min(where=None)

Return the minimum of a column.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Filter in values when `where` is `True`	`None`

Returns

Name	Type	Description
	Scalar	The minimum value in `self`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.min()

┌──────┐
│ 2700 │
└──────┘

>>> t.body_mass_g.min(where=t.species == "Adelie")

┌──────┐
│ 2850 │
└──────┘

mode

mode(where=None)

Return the mode of a column.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Filter in values when `where` is `True`	`None`

Returns

Name	Type	Description
	Scalar	The mode of `self`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.mode()

┌──────┐
│ 3800 │
└──────┘

>>> t.body_mass_g.mode(where=(t.species == "Gentoo") & (t.sex == "male"))

┌──────┐
│ 5550 │
└──────┘

nth

nth(n)

Return the nth value (0-indexed) over a window.

.nth(0) is equivalent to .first(). Negative will result in NULL. If the value of n is greater than the number of rows in the window, NULL will be returned.

Parameters

Name	Type	Description	Default
n	int \| ir.IntegerValue	Desired rank value	required

Returns

Name	Type	Description
	Column	The nth value over a window

ntile

ntile(buckets)

Return the integer number of a partitioning of the column values.

Parameters

Name	Type	Description	Default
buckets	int \| ir.IntegerValue	Number of buckets to partition into	required

nunique

nunique(where=None)

Compute the number of distinct rows in an expression.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Filter expression	`None`

Returns

Name	Type	Description
	IntegerScalar	Number of distinct elements in an expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.nunique()

┌────┐
│ 94 │
└────┘

>>> t.body_mass_g.nunique(where=t.species == "Adelie")

┌────┐
│ 55 │
└────┘

percent_rank

percent_rank()

Return the relative rank of the values in the column.

preview

preview(
    max_rows=None,
    max_length=None,
    max_string=None,
    max_depth=None,
    console_width=None,
)

Print a subset as a single-column Rich Table.

This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive.

Parameters

Name	Type	Description	Default
max_rows	int \| None	Maximum number of rows to display	`None`
max_length	int \| None	Maximum length for pretty-printed arrays and maps.	`None`
max_string	int \| None	Maximum length for pretty-printed strings.	`None`
max_depth	int \| None	Maximum depth for nested data types.	`None`
console_width	int \| float \| None	Width of the console in characters. If not specified, the width will be inferred from the console.	`None`

Examples

>>> import ibis
>>> t = ibis.examples.penguins.fetch()
>>> t.island.preview(max_rows=3, max_string=5)

┏━━━━━━━━┓
┃ island ┃
┡━━━━━━━━┩
│ stri…  │
├────────┤
│ Torg…  │
│ Torg…  │
│ Torg…  │
│ …      │
└────────┘

quantile

quantile(quantile, where=None)

Return value at the given quantile.

The output of this method is a continuous quantile if the input is numeric, otherwise the output is a discrete quantile.

Parameters

Name	Type	Description	Default
quantile	float \| ir.NumericValue \| Sequence[ir.NumericValue \| float]	`0 <= quantile <= 1`, or an array of such values indicating the quantile or quantiles to compute	required
where	ir.BooleanValue \| None	Boolean filter for input values	`None`

Returns

Name	Type	Description
	Scalar	Quantile of the input

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()

Compute the 99th percentile of bill_depth

>>> t.bill_depth_mm.quantile(0.99)

┌──────┐
│ 21.1 │
└──────┘

>>> t.group_by(t.species).agg(p99_bill_depth=t.bill_depth_mm.quantile(0.99)).order_by(
...     ibis.desc("p99_bill_depth")
... )

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species   ┃ p99_bill_depth ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string    │ float64        │
├───────────┼────────────────┤
│ Adelie    │         21.200 │
│ Chinstrap │         20.733 │
│ Gentoo    │         17.256 │
└───────────┴────────────────┘

In addition to numeric types, any orderable non-numeric types such as strings and dates work with quantile.

Let’s compute the 99th percentile of the species column

>>> t.group_by(t.island).agg(p99_species=t.species.quantile(0.99)).order_by(
...     ibis.desc("p99_species")
... )

┏━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ island    ┃ p99_species ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ string    │ string      │
├───────────┼─────────────┤
│ Biscoe    │ Gentoo      │
│ Dream     │ Chinstrap   │
│ Torgersen │ Adelie      │
└───────────┴─────────────┘

rank

rank()

Compute position of first element within each equal-value group in sorted order.

Equivalent to SQL’s RANK() window function.

Returns

Name	Type	Description
	Int64Column	The min rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.rank())

┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ rank  ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64  │ int64 │
├────────┼───────┤
│      1 │     0 │
│      1 │     0 │
│      2 │     2 │
│      2 │     2 │
│      2 │     2 │
│      3 │     5 │
└────────┴───────┘

topk

topk(k, by=None, *, name=None)

Return a “top k” expression.

Computes a Table containing the top k values by a certain metric (defaults to count).

Parameters

Name	Type	Description	Default
k	int	The number of rows to return.	required
by	ir.Value \| None	The metric to compute “top” by. Defaults to `count`.	`None`
name	str \| None	The name to use for the metric column. A suitable name will be automatically generated if not provided.	`None`

Returns

Name	Type	Description
	Table	The top `k` values.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.diamonds.fetch()

Compute the top 3 diamond colors by frequency:

>>> t.color.topk(3)

┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ color  ┃ color_count ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ int64       │
├────────┼─────────────┤
│ G      │       11292 │
│ E      │        9797 │
│ F      │        9542 │
└────────┴─────────────┘

Compute the top 3 diamond colors by mean price:

>>> t.color.topk(3, by=t.price.mean())

┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ color  ┃ Mean(price) ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ float64     │
├────────┼─────────────┤
│ J      │ 5323.818020 │
│ I      │ 5091.874954 │
│ H      │ 4486.669196 │
└────────┴─────────────┘

Compute the top 2 diamond colors by max carat:

>>> t.color.topk(2, by=t.carat.max(), name="max_carat")

┏━━━━━━━━┳━━━━━━━━━━━┓
┃ color  ┃ max_carat ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64   │
├────────┼───────────┤
│ J      │      5.01 │
│ H      │      4.13 │
└────────┴───────────┘

value_counts

value_counts(name=None)

Compute a frequency table.

Parameters

Name	Type	Description	Default
name	str \| None	The name to use for the frequency column. A suitable name will be automatically generated if not provided.	`None`

Returns

Name	Type	Description
	Table	The frequency table.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "a", "b", "c", "c", "c", "d", "d", "d", "d"]})

Compute the count of each unique value in “chars”, ordered by “chars”:

>>> t.chars.value_counts().order_by("chars")

┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ chars  ┃ chars_count ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ int64       │
├────────┼─────────────┤
│ a      │           2 │
│ b      │           1 │
│ c      │           3 │
│ d      │           4 │
└────────┴─────────────┘

Compute the count of each unique value in “chars” as a column named “freq”, ordered by “freq”:

>>> t.chars.value_counts(name="freq").order_by("freq")

┏━━━━━━━━┳━━━━━━━┓
┃ chars  ┃ freq  ┃
┡━━━━━━━━╇━━━━━━━┩
│ string │ int64 │
├────────┼───────┤
│ b      │     1 │
│ a      │     2 │
│ c      │     3 │
│ d      │     4 │
└────────┴───────┘