Column
Column(arg)
Methods
Name | Description |
---|---|
approx_median | Return an approximate of the median of self . |
approx_nunique | Return the approximate number of distinct elements in self . |
arbitrary | Select an arbitrary value in a column. |
argmax | Return the value of self that maximizes key . |
argmin | Return the value of self that minimizes key . |
as_scalar | Inform ibis that the expression should be treated as a scalar. |
as_table | Promote the expression to a Table. |
count | Compute the number of rows in an expression. |
cume_dist | Return the cumulative distribution over a window. |
cummax | Return the cumulative max over a window. |
cummin | Return the cumulative min over a window. |
dense_rank | Position of first element within each group of equal values. |
first | Return the first value of a column. |
lag | Return the row located at offset rows before the current row. |
last | Return the last value of a column. |
lead | Return the row located at offset rows after the current row. |
max | Return the maximum of a column. |
median | Return the median of the column. |
min | Return the minimum of a column. |
mode | Return the mode of a column. |
nth | Return the n th value (0-indexed) over a window. |
ntile | Return the integer number of a partitioning of the column values. |
nunique | Compute the number of distinct rows in an expression. |
percent_rank | Return the relative rank of the values in the column. |
preview | Print a subset as a single-column Rich Table. |
quantile | Return value at the given quantile. |
rank | Compute position of first element within each equal-value group in sorted order. |
topk | Return a “top k” expression. |
value_counts | Compute a frequency table. |
approx_median
=None) approx_median(where
Return an approximate of the median of self
.
Whether the result is an approximation depends on the backend.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Returns
Name | Type | Description |
---|---|---|
Scalar | An approximation of the median of self |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.approx_median()
>>> t.body_mass_g.approx_median(where=t.species == "Chinstrap")
approx_nunique
=None) approx_nunique(where
Return the approximate number of distinct elements in self
.
Whether the result is an approximation depends on the backend.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Returns
Name | Type | Description |
---|---|---|
Scalar | An approximate count of the distinct elements of self |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.approx_nunique()
>>> t.body_mass_g.approx_nunique(where=t.species == "Adelie")
arbitrary
=None, how=None) arbitrary(where
Select an arbitrary value in a column.
Returns an arbitrary (nondeterministic, backend-specific) value from the column. The value will be non-NULL, except if the column is empty or all values are NULL.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | A filter expression | None |
how | Any | DEPRECATED | None |
Returns
Name | Type | Description |
---|---|---|
Scalar | An expression |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 2, 2], "b": list("aaa"), "c": [4.0, 4.1, 4.2]})
>>> t
>>> t.group_by("a").agg(arb=t.b.arbitrary(), c=t.c.sum()).order_by("a")
argmax
=None) argmax(key, where
Return the value of self
that maximizes key
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
key | ir.Value | Key to use for max computation. |
required |
where | ir.BooleanValue | None | Keep values when where is True |
None |
Returns
Name | Type | Description |
---|---|---|
Scalar | The value of self that maximizes key |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.species.argmax(t.body_mass_g)
>>> t.species.argmax(t.body_mass_g, where=t.island == "Dream")
argmin
=None) argmin(key, where
Return the value of self
that minimizes key
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
key | ir.Value | Key to use for min computation. |
required |
where | ir.BooleanValue | None | Keep values when where is True |
None |
Returns
Name | Type | Description |
---|---|---|
Scalar | The value of self that minimizes key |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.species.argmin(t.body_mass_g)
>>> t.species.argmin(t.body_mass_g, where=t.island == "Biscoe")
as_scalar
as_scalar()
Inform ibis that the expression should be treated as a scalar.
Creates a scalar subquery from the column expression. Since ibis cannot be sure that the column expression contains only one value, the column expression is wrapped in a scalar subquery and treated as a scalar.
Note that the execution of the scalar subquery will fail if the column expression contains more than one value.
Returns
Name | Type | Description |
---|---|---|
Scalar | A scalar subquery |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.island.as_scalar())
>>> from_that_island.species.value_counts().order_by("species")
as_table
as_table()
Promote the expression to a Table.
Returns
Name | Type | Description |
---|---|---|
Table | A table expression |
Examples
>>> import ibis
>>> t = ibis.table(dict(a="str"), name="t")
>>> expr = t.a.length().name("len").as_table()
>>> expected = t.select(len=t.a.length())
>>> expr.equals(expected)
count
=None) count(where
Compute the number of rows in an expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter expression | None |
Returns
Name | Type | Description |
---|---|---|
IntegerScalar | Number of elements in an expression |
cume_dist
cume_dist()
Return the cumulative distribution over a window.
cummax
=None, group_by=None, order_by=None) cummax(where
Return the cumulative max over a window.
cummin
=None, group_by=None, order_by=None) cummin(where
Return the cumulative min over a window.
dense_rank
dense_rank()
Position of first element within each group of equal values.
Values are returned in sorted order and duplicate values are ignored.
Equivalent to SQL’s DENSE_RANK()
.
Returns
Name | Type | Description |
---|---|---|
IntegerColumn | The rank |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.dense_rank())
first
=None, order_by=None, include_null=False) first(where
Return the first value of a column.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | An optional filter expression. If provided, only rows where where is True will be included in the aggregate. |
None |
order_by | Any | An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of first is undefined and will be backend specific. |
None |
include_null | bool | Whether to include null values when performing this aggregation. Set to True to include nulls in the result. |
False |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "b", "c", "d"]})
>>> t
>>> t.chars.first()
>>> t.chars.first(where=t.chars != "a")
lag
=None, default=None) lag(offset
Return the row located at offset
rows before the current row.
Parameters
Name | Type | Description | Default |
---|---|---|---|
offset | int | ir.IntegerValue | None | Index of row to select | None |
default | Value | None | Value used if no row exists at offset |
None |
last
=None, order_by=None, include_null=False) last(where
Return the last value of a column.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | An optional filter expression. If provided, only rows where where is True will be included in the aggregate. |
None |
order_by | Any | An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of last is undefined and will be backend specific. |
None |
include_null | bool | Whether to include null values when performing this aggregation. Set to True to include nulls in the result. |
False |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "b", "c", "d"]})
>>> t
>>> t.chars.last()
>>> t.chars.last(where=t.chars != "d")
lead
=None, default=None) lead(offset
Return the row located at offset
rows after the current row.
Parameters
Name | Type | Description | Default |
---|---|---|---|
offset | int | ir.IntegerValue | None | Index of row to select | None |
default | Value | None | Value used if no row exists at offset |
None |
max
max(where=None)
Return the maximum of a column.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Returns
Name | Type | Description |
---|---|---|
Scalar | The maximum value in self |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.max()
>>> t.body_mass_g.max(where=t.species == "Chinstrap")
median
=None) median(where
Return the median of the column.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Optional boolean expression. If given, only the values where where evaluates to true will be considered for the median. |
None |
Returns
Name | Type | Description |
---|---|---|
Scalar | Median of the column |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
Compute the median of bill_depth_mm
>>> t.bill_depth_mm.median()
>>> t.group_by(t.species).agg(median_bill_depth=t.bill_depth_mm.median()).order_by(
"median_bill_depth")
... ibis.desc( ... )
In addition to numeric types, any orderable non-numeric types such as strings and dates work with median
.
>>> t.group_by(t.island).agg(median_species=t.species.median()).order_by(
"median_species")
... ibis.desc( ... )
min
min(where=None)
Return the minimum of a column.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Returns
Name | Type | Description |
---|---|---|
Scalar | The minimum value in self |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.min()
>>> t.body_mass_g.min(where=t.species == "Adelie")
mode
=None) mode(where
Return the mode of a column.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter in values when where is True |
None |
Returns
Name | Type | Description |
---|---|---|
Scalar | The mode of self |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.mode()
>>> t.body_mass_g.mode(where=(t.species == "Gentoo") & (t.sex == "male"))
nth
nth(n)
Return the n
th value (0-indexed) over a window.
.nth(0)
is equivalent to .first()
. Negative will result in NULL
. If the value of n
is greater than the number of rows in the window, NULL
will be returned.
Parameters
Name | Type | Description | Default |
---|---|---|---|
n | int | ir.IntegerValue | Desired rank value | required |
Returns
Name | Type | Description |
---|---|---|
Column | The nth value over a window |
ntile
ntile(buckets)
Return the integer number of a partitioning of the column values.
Parameters
Name | Type | Description | Default |
---|---|---|---|
buckets | int | ir.IntegerValue | Number of buckets to partition into | required |
nunique
=None) nunique(where
Compute the number of distinct rows in an expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
where | ir.BooleanValue | None | Filter expression | None |
Returns
Name | Type | Description |
---|---|---|
IntegerScalar | Number of distinct elements in an expression |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.nunique()
>>> t.body_mass_g.nunique(where=t.species == "Adelie")
percent_rank
percent_rank()
Return the relative rank of the values in the column.
preview
preview(=None,
max_rows=None,
max_length=None,
max_string=None,
max_depth=None,
console_width )
Print a subset as a single-column Rich Table.
This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
max_rows | int | None | Maximum number of rows to display | None |
max_length | int | None | Maximum length for pretty-printed arrays and maps. | None |
max_string | int | None | Maximum length for pretty-printed strings. | None |
max_depth | int | None | Maximum depth for nested data types. | None |
console_width | int | float | None | Width of the console in characters. If not specified, the width will be inferred from the console. | None |
Examples
>>> import ibis
>>> t = ibis.examples.penguins.fetch()
>>> t.island.preview(max_rows=3, max_string=5)
quantile
=None) quantile(quantile, where
Return value at the given quantile.
The output of this method is a continuous quantile if the input is numeric, otherwise the output is a discrete quantile.
Parameters
Name | Type | Description | Default |
---|---|---|---|
quantile | float | ir.NumericValue | Sequence[ir.NumericValue | float] | 0 <= quantile <= 1 , or an array of such values indicating the quantile or quantiles to compute |
required |
where | ir.BooleanValue | None | Boolean filter for input values | None |
Returns
Name | Type | Description |
---|---|---|
Scalar | Quantile of the input |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
Compute the 99th percentile of bill_depth
>>> t.bill_depth_mm.quantile(0.99)
>>> t.group_by(t.species).agg(p99_bill_depth=t.bill_depth_mm.quantile(0.99)).order_by(
"p99_bill_depth")
... ibis.desc( ... )
In addition to numeric types, any orderable non-numeric types such as strings and dates work with quantile
.
Let’s compute the 99th percentile of the species
column
>>> t.group_by(t.island).agg(p99_species=t.species.quantile(0.99)).order_by(
"p99_species")
... ibis.desc( ... )
rank
rank()
Compute position of first element within each equal-value group in sorted order.
Equivalent to SQL’s RANK()
window function.
Returns
Name | Type | Description |
---|---|---|
Int64Column | The min rank |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.rank())
topk
=None, *, name=None) topk(k, by
Return a “top k” expression.
Computes a Table containing the top k
values by a certain metric (defaults to count).
Parameters
Name | Type | Description | Default |
---|---|---|---|
k | int | The number of rows to return. | required |
by | ir.Value | None | The metric to compute “top” by. Defaults to count . |
None |
name | str | None | The name to use for the metric column. A suitable name will be automatically generated if not provided. | None |
Returns
Name | Type | Description |
---|---|---|
Table | The top k values. |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.diamonds.fetch()
Compute the top 3 diamond colors by frequency:
>>> t.color.topk(3)
Compute the top 3 diamond colors by mean price:
>>> t.color.topk(3, by=t.price.mean())
Compute the top 2 diamond colors by max carat:
>>> t.color.topk(2, by=t.carat.max(), name="max_carat")
value_counts
=None) value_counts(name
Compute a frequency table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | None | The name to use for the frequency column. A suitable name will be automatically generated if not provided. | None |
Returns
Name | Type | Description |
---|---|---|
Table | The frequency table. |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": ["a", "a", "b", "c", "c", "c", "d", "d", "d", "d"]})
Compute the count of each unique value in “chars”, ordered by “chars”:
>>> t.chars.value_counts().order_by("chars")
Compute the count of each unique value in “chars” as a column named “freq”, ordered by “freq”:
>>> t.chars.value_counts(name="freq").order_by("freq")