>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.approx_median()
┌────────┐
│ 4030.0 │
└────────┘
Name | Description |
---|---|
approx_median | Return an approximate of the median of self . |
approx_nunique | Return the approximate number of distinct elements in self . |
arbitrary | Select an arbitrary value in a column. |
argmax | Return the value of self that maximizes key . |
argmin | Return the value of self that minimizes key . |
as_scalar | Inform ibis that the expression should be treated as a scalar. |
as_table | Promote the expression to a Table. |
count | Compute the number of rows in an expression. |
cume_dist | Return the cumulative distribution over a window. |
cummax | Return the cumulative max over a window. |
cummin | Return the cumulative min over a window. |
dense_rank | Position of first element within each group of equal values. |
first | Return the first value of a column. |
lag | Return the row located at offset rows before the current row. |
last | Return the last value of a column. |
lead | Return the row located at offset rows after the current row. |
max | Return the maximum of a column. |
median | Return the median of the column. |
min | Return the minimum of a column. |
mode | Return the mode of a column. |
nth | Return the n th value (0-indexed) over a window. |
ntile | Return the integer number of a partitioning of the column values. |
nunique | Compute the number of distinct rows in an expression. |
percent_rank | Return the relative rank of the values in the column. |
preview | Print a subset as a single-column Rich Table. |
quantile | Return value at the given quantile. |
rank | Compute position of first element within each equal-value group in sorted order. |
topk | Return a “top k” expression. |
value_counts | Compute a frequency table. |
Return an approximate of the median of self
.
Whether the result is an approximation depends on the backend.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | An approximation of the median of self |
Return the approximate number of distinct elements in self
.
Whether the result is an approximation depends on the backend.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | An approximate count of the distinct elements of self |
Select an arbitrary value in a column.
Returns an arbitrary (nondeterministic, backend-specific) value from the column. The value will be non-NULL, except if the column is empty or all values are NULL.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | A filter expression | None |
how | Any | DEPRECATED | None |
Name | Type | Description |
---|---|---|
Scalar | An expression |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 2, 2], "b": list("aaa"), "c": [4.0, 4.1, 4.2]})
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓ ┃ a ┃ b ┃ c ┃ ┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩ │ int64 │ string │ float64 │ ├───────┼────────┼─────────┤ │ 1 │ a │ 4.0 │ │ 2 │ a │ 4.1 │ │ 2 │ a │ 4.2 │ └───────┴────────┴─────────┘
Return the value of self
that maximizes key
.
Name | Type | Description | Default |
---|---|---|---|
key | ir.Value | Key to use for max computation. |
required |
where | ir.BooleanValue | None | Keep values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The value of self that maximizes key |
Return the value of self
that minimizes key
.
Name | Type | Description | Default |
---|---|---|---|
key | ir.Value | Key to use for min computation. |
required |
where | ir.BooleanValue | None | Keep values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The value of self that minimizes key |
Inform ibis that the expression should be treated as a scalar.
Creates a scalar subquery from the column expression. Since ibis cannot be sure that the column expression contains only one value, the column expression is wrapped in a scalar subquery and treated as a scalar.
Note that the execution of the scalar subquery will fail if the column expression contains more than one value.
Name | Type | Description |
---|---|---|
Scalar | A scalar subquery |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.island.as_scalar())
>>> from_that_island.species.value_counts().order_by("species")
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ species ┃ species_count ┃ ┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ string │ int64 │ ├─────────┼───────────────┤ │ Adelie │ 44 │ │ Gentoo │ 124 │ └─────────┴───────────────┘
Promote the expression to a Table.
Name | Type | Description |
---|---|---|
Table | A table expression |
Compute the number of rows in an expression.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter expression | None |
Name | Type | Description |
---|---|---|
IntegerScalar | Number of elements in an expression |
Return the cumulative distribution over a window.
Return the cumulative max over a window.
Return the cumulative min over a window.
Position of first element within each group of equal values.
Values are returned in sorted order and duplicate values are ignored.
Equivalent to SQL’s DENSE_RANK()
.
Name | Type | Description |
---|---|---|
IntegerColumn | The rank |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.dense_rank())
┏━━━━━━━━┳━━━━━━━┓ ┃ values ┃ rank ┃ ┡━━━━━━━━╇━━━━━━━┩ │ int64 │ int64 │ ├────────┼───────┤ │ 1 │ 0 │ │ 1 │ 0 │ │ 2 │ 1 │ │ 2 │ 1 │ │ 2 │ 1 │ │ 3 │ 2 │ └────────┴───────┘
Return the first value of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | An optional filter expression. If provided, only rows where where is True will be included in the aggregate. |
None |
order_by | Any | An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of first is undefined and will be backend specific. |
None |
include_null | bool | Whether to include null values when performing this aggregation. Set to True to include nulls in the result. |
False |
Return the row located at offset
rows before the current row.
Name | Type | Description | Default |
---|---|---|---|
offset | int | ir.IntegerValue | None | Index of row to select | None |
default | Value | None | Value used if no row exists at offset |
None |
Return the last value of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | An optional filter expression. If provided, only rows where where is True will be included in the aggregate. |
None |
order_by | Any | An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of last is undefined and will be backend specific. |
None |
include_null | bool | Whether to include null values when performing this aggregation. Set to True to include nulls in the result. |
False |
Return the row located at offset
rows after the current row.
Name | Type | Description | Default |
---|---|---|---|
offset | int | ir.IntegerValue | None | Index of row to select | None |
default | Value | None | Value used if no row exists at offset |
None |
Return the maximum of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The maximum value in self |
Return the median of the column.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Optional boolean expression. If given, only the values where where evaluates to true will be considered for the median. |
None |
Name | Type | Description |
---|---|---|
Scalar | Median of the column |
Compute the median of bill_depth_mm
>>> t.group_by(t.species).agg(median_bill_depth=t.bill_depth_mm.median()).order_by(
... ibis.desc("median_bill_depth")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ species ┃ median_bill_depth ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ string │ float64 │ ├───────────┼───────────────────┤ │ Chinstrap │ 18.45 │ │ Adelie │ 18.40 │ │ Gentoo │ 15.00 │ └───────────┴───────────────────┘
In addition to numeric types, any orderable non-numeric types such as strings and dates work with median
.
>>> t.group_by(t.island).agg(median_species=t.species.median()).order_by(
... ibis.desc("median_species")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ ┃ island ┃ median_species ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ │ string │ string │ ├───────────┼────────────────┤ │ Biscoe │ Gentoo │ │ Dream │ Chinstrap │ │ Torgersen │ Adelie │ └───────────┴────────────────┘
Return the minimum of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The minimum value in self |
Return the mode of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The mode of self |
Return the n
th value (0-indexed) over a window.
.nth(0)
is equivalent to .first()
. Negative will result in NULL
. If the value of n
is greater than the number of rows in the window, NULL
will be returned.
Name | Type | Description | Default |
---|---|---|---|
n | int | ir.IntegerValue | Desired rank value | required |
Name | Type | Description |
---|---|---|
Column | The nth value over a window |
Return the integer number of a partitioning of the column values.
Name | Type | Description | Default |
---|---|---|---|
buckets | int | ir.IntegerValue | Number of buckets to partition into | required |
Compute the number of distinct rows in an expression.
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter expression | None |
Name | Type | Description |
---|---|---|
IntegerScalar | Number of distinct elements in an expression |
Return the relative rank of the values in the column.
Print a subset as a single-column Rich Table.
This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive
.
Name | Type | Description | Default |
---|---|---|---|
max_rows | int | None | Maximum number of rows to display | None |
max_length | int | None | Maximum length for pretty-printed arrays and maps. | None |
max_string | int | None | Maximum length for pretty-printed strings. | None |
max_depth | int | None | Maximum depth for nested data types. | None |
console_width | int | float | None | Width of the console in characters. If not specified, the width will be inferred from the console. | None |
Return value at the given quantile.
The output of this method is a continuous quantile if the input is numeric, otherwise the output is a discrete quantile.
Name | Type | Description | Default |
---|---|---|---|
quantile | float | ir.NumericValue | Sequence[ir.NumericValue | float] | 0 <= quantile <= 1 , or an array of such values indicating the quantile or quantiles to compute |
required |
where | ir.BooleanValue | None | Boolean filter for input values | None |
Name | Type | Description |
---|---|---|
Scalar | Quantile of the input |
Compute the 99th percentile of bill_depth
>>> t.group_by(t.species).agg(p99_bill_depth=t.bill_depth_mm.quantile(0.99)).order_by(
... ibis.desc("p99_bill_depth")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ ┃ species ┃ p99_bill_depth ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ │ string │ float64 │ ├───────────┼────────────────┤ │ Adelie │ 21.200 │ │ Chinstrap │ 20.733 │ │ Gentoo │ 17.256 │ └───────────┴────────────────┘
In addition to numeric types, any orderable non-numeric types such as strings and dates work with quantile
.
Let’s compute the 99th percentile of the species
column
>>> t.group_by(t.island).agg(p99_species=t.species.quantile(0.99)).order_by(
... ibis.desc("p99_species")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ island ┃ p99_species ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ string │ ├───────────┼─────────────┤ │ Biscoe │ Gentoo │ │ Dream │ Chinstrap │ │ Torgersen │ Adelie │ └───────────┴─────────────┘
Compute position of first element within each equal-value group in sorted order.
Equivalent to SQL’s RANK()
window function.
Name | Type | Description |
---|---|---|
Int64Column | The min rank |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.rank())
┏━━━━━━━━┳━━━━━━━┓ ┃ values ┃ rank ┃ ┡━━━━━━━━╇━━━━━━━┩ │ int64 │ int64 │ ├────────┼───────┤ │ 1 │ 0 │ │ 1 │ 0 │ │ 2 │ 2 │ │ 2 │ 2 │ │ 2 │ 2 │ │ 3 │ 5 │ └────────┴───────┘
Return a “top k” expression.
Computes a Table containing the top k
values by a certain metric (defaults to count).
Name | Type | Description | Default |
---|---|---|---|
k | int | The number of rows to return. | required |
by | ir.Value | None | The metric to compute “top” by. Defaults to count . |
None |
name | str | None | The name to use for the metric column. A suitable name will be automatically generated if not provided. | None |
Name | Type | Description |
---|---|---|
Table | The top k values. |
Compute the top 3 diamond colors by frequency:
┏━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ color ┃ color_count ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ int64 │ ├────────┼─────────────┤ │ G │ 11292 │ │ E │ 9797 │ │ F │ 9542 │ └────────┴─────────────┘
Compute the top 3 diamond colors by mean price:
┏━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ color ┃ Mean(price) ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ float64 │ ├────────┼─────────────┤ │ J │ 5323.818020 │ │ I │ 5091.874954 │ │ H │ 4486.669196 │ └────────┴─────────────┘
Compute the top 2 diamond colors by max carat:
Compute a frequency table.
Name | Type | Description | Default |
---|---|---|---|
name | str | None | The name to use for the frequency column. A suitable name will be automatically generated if not provided. | None |
Name | Type | Description |
---|---|---|
Table | The frequency table. |
Compute the count of each unique value in “chars”, ordered by “chars”:
┏━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ chars ┃ chars_count ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ int64 │ ├────────┼─────────────┤ │ a │ 2 │ │ b │ 1 │ │ c │ 3 │ │ d │ 4 │ └────────┴─────────────┘
Compute the count of each unique value in “chars” as a column named “freq”, ordered by “freq”: