Column

Column(arg)

Methods

Name Description
approx_median Return an approximate of the median of self.
approx_nunique Return the approximate number of distinct elements in self.
arbitrary Select an arbitrary value in a column.
argmax Return the value of self that maximizes key.
argmin Return the value of self that minimizes key.
as_scalar Inform ibis that the expression should be treated as a scalar.
as_table Promote the expression to a Table.
count Compute the number of rows in an expression.
cume_dist Return the cumulative distribution over a window.
cummax Return the cumulative max over a window.
cummin Return the cumulative min over a window.
dense_rank Position of first element within each group of equal values.
first Return the first value of a column.
lag Return the row located at offset rows before the current row.
last Return the last value of a column.
lead Return the row located at offset rows after the current row.
max Return the maximum of a column.
median Return the median of the column.
min Return the minimum of a column.
mode Return the mode of a column.
nth Return the nth value (0-indexed) over a window.
ntile Return the integer number of a partitioning of the column values.
nunique Compute the number of distinct rows in an expression.
percent_rank Return the relative rank of the values in the column.
preview Print a subset as a single-column Rich Table.
quantile Return value at the given quantile.
rank Compute position of first element within each equal-value group in sorted order.
topk Return a “top k” expression.
value_counts Compute a frequency table.

approx_median

approx_median(where=None)

Return an approximate of the median of self.

The result may or may not be exact

Whether the result is an approximation depends on the backend.

Do not depend on the results being exact

Parameters

Name Type Description Default
where ir.BooleanValue | None Filter in values when where is True None

Returns

Name Type Description
Scalar An approximation of the median of self

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.approx_median()

┌────────┐
│ 4030.0 │
└────────┘
>>> t.body_mass_g.approx_median(where=t.species == "Chinstrap")

┌────────┐
│ 3700.0 │
└────────┘

approx_nunique

approx_nunique(where=None)

Return the approximate number of distinct elements in self.

The result may or may not be exact

Whether the result is an approximation depends on the backend.

Do not depend on the results being exact

Parameters

Name Type Description Default
where ir.BooleanValue | None Filter in values when where is True None

Returns

Name Type Description
Scalar An approximate count of the distinct elements of self

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.approx_nunique()

┌────┐
│ 92 │
└────┘
>>> t.body_mass_g.approx_nunique(where=t.species == "Adelie")

┌────┐
│ 61 │
└────┘

arbitrary

arbitrary(where=None, how=None)

Select an arbitrary value in a column.

Returns an arbitrary (nondeterministic, backend-specific) value from the column. The value will be non-NULL, except if the column is empty or all values are NULL.

Parameters

Name Type Description Default
where ir.BooleanValue | None A filter expression None
how Any DEPRECATED None

Returns

Name Type Description
Scalar An expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 2, 2], "b": list("aaa"), "c": [4.0, 4.1, 4.2]})
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ a      b       c       ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ int64stringfloat64 │
├───────┼────────┼─────────┤
│     1a     4.0 │
│     2a     4.1 │
│     2a     4.2 │
└───────┴────────┴─────────┘
>>> t.group_by("a").agg(arb=t.b.arbitrary(), c=t.c.sum()).order_by("a")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ a      arb     c       ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ int64stringfloat64 │
├───────┼────────┼─────────┤
│     1a     4.0 │
│     2a     8.3 │
└───────┴────────┴─────────┘

argmax

argmax(key, where=None)

Return the value of self that maximizes key.

Parameters

Name Type Description Default
key ir.Value Key to use for max computation. required
where ir.BooleanValue | None Keep values when where is True None

Returns

Name Type Description
Scalar The value of self that maximizes key

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.species.argmax(t.body_mass_g)

┌────────┐
│ Gentoo │
└────────┘
>>> t.species.argmax(t.body_mass_g, where=t.island == "Dream")

┌───────────┐
│ Chinstrap │
└───────────┘

argmin

argmin(key, where=None)

Return the value of self that minimizes key.

Parameters

Name Type Description Default
key ir.Value Key to use for min computation. required
where ir.BooleanValue | None Keep values when where is True None

Returns

Name Type Description
Scalar The value of self that minimizes key

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.species.argmin(t.body_mass_g)

┌───────────┐
│ Chinstrap │
└───────────┘
>>> t.species.argmin(t.body_mass_g, where=t.island == "Biscoe")

┌────────┐
│ Adelie │
└────────┘

as_scalar

as_scalar()

Inform ibis that the expression should be treated as a scalar.

Creates a scalar subquery from the column expression. Since ibis cannot be sure that the column expression contains only one value, the column expression is wrapped in a scalar subquery and treated as a scalar.

Note that the execution of the scalar subquery will fail if the column expression contains more than one value.

Returns

Name Type Description
Scalar A scalar subquery

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.island.as_scalar())
>>> from_that_island.species.value_counts().order_by("species")
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species  species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ stringint64         │
├─────────┼───────────────┤
│ Adelie 44 │
│ Gentoo 124 │
└─────────┴───────────────┘

as_table

as_table()

Promote the expression to a Table.

Returns

Name Type Description
Table A table expression

Examples

>>> import ibis
>>> t = ibis.table(dict(a="str"), name="t")
>>> expr = t.a.length().name("len").as_table()
>>> expected = t.select(len=t.a.length())
>>> expr.equals(expected)
True

count

count(where=None)

Compute the number of rows in an expression.

Parameters

Name Type Description Default
where ir.BooleanValue | None Filter expression None

Returns

Name Type Description
IntegerScalar Number of elements in an expression

cume_dist

cume_dist()

Return the cumulative distribution over a window.

cummax

cummax(where=None, group_by=None, order_by=None)

Return the cumulative max over a window.

cummin

cummin(where=None, group_by=None, order_by=None)

Return the cumulative min over a window.

dense_rank

dense_rank()

Position of first element within each group of equal values.

Values are returned in sorted order and duplicate values are ignored.

Equivalent to SQL’s DENSE_RANK().

Returns

Name Type Description
IntegerColumn The rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.dense_rank())
┏━━━━━━━━┳━━━━━━━┓
┃ values  rank  ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64int64 │
├────────┼───────┤
│      10 │
│      10 │
│      21 │
│      21 │
│      21 │
│      32 │
└────────┴───────┘

first

first(where=None, order_by=None, include_null=False)

Return the first value of a column.

Parameters

Name Type Description Default
where ir.BooleanValue | None An optional filter expression. If provided, only rows where where is True will be included in the aggregate. None
order_by Any An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of first is undefined and will be backend specific. None
include_null bool Whether to include null values when performing this aggregation. Set to True to include nulls in the result. False

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "b", "c", "d"]})
>>> t
┏━━━━━━━━┓
┃ chars  ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ a      │
│ b      │
│ c      │
│ d      │
└────────┘
>>> t.chars.first()

┌───┐
│ a │
└───┘
>>> t.chars.first(where=t.chars != "a")

┌───┐
│ b │
└───┘

lag

lag(offset=None, default=None)

Return the row located at offset rows before the current row.

Parameters

Name Type Description Default
offset int | ir.IntegerValue | None Index of row to select None
default Value | None Value used if no row exists at offset None

last

last(where=None, order_by=None, include_null=False)

Return the last value of a column.

Parameters

Name Type Description Default
where ir.BooleanValue | None An optional filter expression. If provided, only rows where where is True will be included in the aggregate. None
order_by Any An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of last is undefined and will be backend specific. None
include_null bool Whether to include null values when performing this aggregation. Set to True to include nulls in the result. False

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "b", "c", "d"]})
>>> t
┏━━━━━━━━┓
┃ chars  ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ a      │
│ b      │
│ c      │
│ d      │
└────────┘
>>> t.chars.last()

┌───┐
│ d │
└───┘
>>> t.chars.last(where=t.chars != "d")

┌───┐
│ c │
└───┘

lead

lead(offset=None, default=None)

Return the row located at offset rows after the current row.

Parameters

Name Type Description Default
offset int | ir.IntegerValue | None Index of row to select None
default Value | None Value used if no row exists at offset None

max

max(where=None)

Return the maximum of a column.

Parameters

Name Type Description Default
where ir.BooleanValue | None Filter in values when where is True None

Returns

Name Type Description
Scalar The maximum value in self

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.max()

┌──────┐
│ 6300 │
└──────┘
>>> t.body_mass_g.max(where=t.species == "Chinstrap")

┌──────┐
│ 4800 │
└──────┘

median

median(where=None)

Return the median of the column.

Parameters

Name Type Description Default
where ir.BooleanValue | None Optional boolean expression. If given, only the values where where evaluates to true will be considered for the median. None

Returns

Name Type Description
Scalar Median of the column

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()

Compute the median of bill_depth_mm

>>> t.bill_depth_mm.median()

┌──────┐
│ 17.3 │
└──────┘
>>> t.group_by(t.species).agg(median_bill_depth=t.bill_depth_mm.median()).order_by(
...     ibis.desc("median_bill_depth")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species    median_bill_depth ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringfloat64           │
├───────────┼───────────────────┤
│ Chinstrap18.45 │
│ Adelie   18.40 │
│ Gentoo   15.00 │
└───────────┴───────────────────┘

In addition to numeric types, any orderable non-numeric types such as strings and dates work with median.

>>> t.group_by(t.island).agg(median_species=t.species.median()).order_by(
...     ibis.desc("median_species")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island     median_species ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringstring         │
├───────────┼────────────────┤
│ Biscoe   Gentoo         │
│ Dream    Chinstrap      │
│ TorgersenAdelie         │
└───────────┴────────────────┘

min

min(where=None)

Return the minimum of a column.

Parameters

Name Type Description Default
where ir.BooleanValue | None Filter in values when where is True None

Returns

Name Type Description
Scalar The minimum value in self

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.min()

┌──────┐
│ 2700 │
└──────┘
>>> t.body_mass_g.min(where=t.species == "Adelie")

┌──────┐
│ 2850 │
└──────┘

mode

mode(where=None)

Return the mode of a column.

Parameters

Name Type Description Default
where ir.BooleanValue | None Filter in values when where is True None

Returns

Name Type Description
Scalar The mode of self

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.mode()

┌──────┐
│ 3800 │
└──────┘
>>> t.body_mass_g.mode(where=(t.species == "Gentoo") & (t.sex == "male"))

┌──────┐
│ 5550 │
└──────┘

nth

nth(n)

Return the nth value (0-indexed) over a window.

.nth(0) is equivalent to .first(). Negative will result in NULL. If the value of n is greater than the number of rows in the window, NULL will be returned.

Parameters

Name Type Description Default
n int | ir.IntegerValue Desired rank value required

Returns

Name Type Description
Column The nth value over a window

ntile

ntile(buckets)

Return the integer number of a partitioning of the column values.

Parameters

Name Type Description Default
buckets int | ir.IntegerValue Number of buckets to partition into required

nunique

nunique(where=None)

Compute the number of distinct rows in an expression.

Parameters

Name Type Description Default
where ir.BooleanValue | None Filter expression None

Returns

Name Type Description
IntegerScalar Number of distinct elements in an expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.nunique()

┌────┐
│ 94 │
└────┘
>>> t.body_mass_g.nunique(where=t.species == "Adelie")

┌────┐
│ 55 │
└────┘

percent_rank

percent_rank()

Return the relative rank of the values in the column.

preview

preview(
    max_rows=None,
    max_length=None,
    max_string=None,
    max_depth=None,
    console_width=None,
)

Print a subset as a single-column Rich Table.

This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive.

Parameters

Name Type Description Default
max_rows int | None Maximum number of rows to display None
max_length int | None Maximum length for pretty-printed arrays and maps. None
max_string int | None Maximum length for pretty-printed strings. None
max_depth int | None Maximum depth for nested data types. None
console_width int | float | None Width of the console in characters. If not specified, the width will be inferred from the console. None

Examples

>>> import ibis
>>> t = ibis.examples.penguins.fetch()
>>> t.island.preview(max_rows=3, max_string=5)
┏━━━━━━━━┓
┃ island ┃
┡━━━━━━━━┩
│ stri…  │
├────────┤
│ Torg…  │
│ Torg…  │
│ Torg…  │
│       │
└────────┘

quantile

quantile(quantile, where=None)

Return value at the given quantile.

The output of this method is a continuous quantile if the input is numeric, otherwise the output is a discrete quantile.

Parameters

Name Type Description Default
quantile float | ir.NumericValue | Sequence[ir.NumericValue | float] 0 <= quantile <= 1, or an array of such values indicating the quantile or quantiles to compute required
where ir.BooleanValue | None Boolean filter for input values None

Returns

Name Type Description
Scalar Quantile of the input

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()

Compute the 99th percentile of bill_depth

>>> t.bill_depth_mm.quantile(0.99)

┌──────┐
│ 21.1 │
└──────┘
>>> t.group_by(t.species).agg(p99_bill_depth=t.bill_depth_mm.quantile(0.99)).order_by(
...     ibis.desc("p99_bill_depth")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species    p99_bill_depth ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringfloat64        │
├───────────┼────────────────┤
│ Adelie   21.200 │
│ Chinstrap20.733 │
│ Gentoo   17.256 │
└───────────┴────────────────┘

In addition to numeric types, any orderable non-numeric types such as strings and dates work with quantile.

Let’s compute the 99th percentile of the species column

>>> t.group_by(t.island).agg(p99_species=t.species.quantile(0.99)).order_by(
...     ibis.desc("p99_species")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ island     p99_species ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ stringstring      │
├───────────┼─────────────┤
│ Biscoe   Gentoo      │
│ Dream    Chinstrap   │
│ TorgersenAdelie      │
└───────────┴─────────────┘

rank

rank()

Compute position of first element within each equal-value group in sorted order.

Equivalent to SQL’s RANK() window function.

Returns

Name Type Description
Int64Column The min rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.rank())
┏━━━━━━━━┳━━━━━━━┓
┃ values  rank  ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64int64 │
├────────┼───────┤
│      10 │
│      10 │
│      22 │
│      22 │
│      22 │
│      35 │
└────────┴───────┘

topk

topk(k, by=None, *, name=None)

Return a “top k” expression.

Computes a Table containing the top k values by a certain metric (defaults to count).

Parameters

Name Type Description Default
k int The number of rows to return. required
by ir.Value | None The metric to compute “top” by. Defaults to count. None
name str | None The name to use for the metric column. A suitable name will be automatically generated if not provided. None

Returns

Name Type Description
Table The top k values.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.diamonds.fetch()

Compute the top 3 diamond colors by frequency:

>>> t.color.topk(3)
┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ color   color_count ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ stringint64       │
├────────┼─────────────┤
│ G     11292 │
│ E     9797 │
│ F     9542 │
└────────┴─────────────┘

Compute the top 3 diamond colors by mean price:

>>> t.color.topk(3, by=t.price.mean())
┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ color   Mean(price) ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ stringfloat64     │
├────────┼─────────────┤
│ J     5323.818020 │
│ I     5091.874954 │
│ H     4486.669196 │
└────────┴─────────────┘

Compute the top 2 diamond colors by max carat:

>>> t.color.topk(2, by=t.carat.max(), name="max_carat")
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ color   max_carat ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ stringfloat64   │
├────────┼───────────┤
│ J     5.01 │
│ H     4.13 │
└────────┴───────────┘

value_counts

value_counts(name=None)

Compute a frequency table.

Parameters

Name Type Description Default
name str | None The name to use for the frequency column. A suitable name will be automatically generated if not provided. None

Returns

Name Type Description
Table The frequency table.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "a", "b", "c", "c", "c", "d", "d", "d", "d"]})

Compute the count of each unique value in “chars”, ordered by “chars”:

>>> t.chars.value_counts().order_by("chars")
┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ chars   chars_count ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ stringint64       │
├────────┼─────────────┤
│ a     2 │
│ b     1 │
│ c     3 │
│ d     4 │
└────────┴─────────────┘

Compute the count of each unique value in “chars” as a column named “freq”, ordered by “freq”:

>>> t.chars.value_counts(name="freq").order_by("freq")
┏━━━━━━━━┳━━━━━━━┓
┃ chars   freq  ┃
┡━━━━━━━━╇━━━━━━━┩
│ stringint64 │
├────────┼───────┤
│ b     1 │
│ a     2 │
│ c     3 │
│ d     4 │
└────────┴───────┘