>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.memtable(
... {
... "fruit": ["apple", "apple", "banana", "orange"],
... "price": [0.5, 0.5, 0.25, 0.33],
... }
... )
>>> tTable
Table(arg)An immutable and lazy dataframe.
Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns, each with a single known type. Unless explicitly ordered with an .order_by(), the order of rows is undefined.
Table immutability means that the data underlying an Ibis Table cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.
You will not create Table objects directly. Instead, you will create one
- from a pandas DataFrame, pyarrow table, Polars table, or raw python dicts/lists with
xorq.memtable(df) - from an existing table in a data platform with
connection.table("name") - from a file or URL, into a specific backend with
connection.read_csv/parquet/json("path/to/file")(only some backends, typically local ones, support this) - from a file or URL, into the default backend with
ibis.read_csv/read_json/read_parquet("path/to/file")
Attributes
| Name | Description |
|---|---|
| columns | The list of column names in this table. |
Methods
| Name | Description |
|---|---|
| aggregate | Aggregate a table with a given set of reductions grouping by by. |
| alias | Create a table expression with a specific name alias. |
| as_scalar | Inform ibis that the table expression should be treated as a scalar. |
| as_table | Promote the expression to a table. |
| asof_join | Perform an “as-of” join between left and right. |
| bind | Bind column values to a table expression. |
| cache | Cache the results of a computation to improve performance on subsequent executions. |
| cast | Cast the columns of a table. |
| compile | Compile to an execution target. |
| count | Compute the number of rows in the table. |
| cross_join | Compute the cross join of a sequence of tables. |
| describe | Return summary information about a table. |
| difference | Compute the set difference of multiple table expressions. |
| distinct | Return a Table with duplicate rows removed. |
| drop | Remove fields from a table. |
| drop_null | Remove rows with null values from the table. |
| dropna | Deprecated - use drop_null instead. |
| equals | Return whether this expression is structurally equivalent to other. |
| execute | Execute an expression against its backend if one exists. |
| fill_null | Fill null values in a table expression. |
| fillna | Deprecated - use fill_null instead. |
| filter | Select rows from table based on predicates. |
| get_name | Return the fully qualified name of the table. |
| group_by | Create a grouped table expression. |
| has_name | Check whether this expression has an explicit name. |
| head | Select the first n rows of a table. |
| info | Return summary information about a table. |
| intersect | Compute the set intersection of multiple table expressions. |
| into_backend | Converts the Expr to a table in the given backend con with an optional table name name. |
| join | Perform a join between two tables. |
| limit | Select n rows from self starting at offset. |
| mutate | Add columns to a table expression. |
| nunique | Compute the number of unique rows in the table. |
| order_by | Sort a table by one or more expressions. |
| pipe | Compose f with self. |
| pivot_longer | Transform a table from wider to longer. |
| pivot_wider | Pivot a table to a wider format. |
| preview | Return a subset as a Rich Table. |
| relabel | Deprecated in favor of Table.rename. |
| relocate | Relocate columns before or after other specified columns. |
| rename | Rename columns in the table. |
| rowid | A unique integer per row. |
| sample | Sample a fraction of rows from a table. |
| schema | Return the Schema for this table. |
| select | Compute a new table expression using exprs and named_exprs. |
| sql | Run a SQL query against a table expression. |
| to_array | View a single column table as an array. |
| to_csv | Write the results of executing the given expression to a CSV file. |
| to_json | Write the results of expr to a NDJSON file. |
| to_pandas | Convert a table expression to a pandas DataFrame. |
| to_parquet | Write the results of executing the given expression to a parquet file. |
| to_pyarrow | Execute expression and return results in as a pyarrow table. |
| to_pyarrow_batches | Execute expression and return a RecordBatchReader. |
| try_cast | Cast the columns of a table. |
| unbind | Return an expression built on UnboundTable instead of backend-specific objects. |
| union | Compute the set union of multiple table expressions. |
| unnest | Unnest an array column from a table. |
| unpack | Project the struct fields of each of columns into self. |
| value_counts | Compute a frequency table of this table’s values. |
| view | Create a new table expression distinct from the current one. |
| visualize | Visualize an expression as a GraphViz graph in the browser. |
aggregate
aggregate(metrics=(), by=(), having=(), **kwargs)Aggregate a table with a given set of reductions grouping by by.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| metrics | Sequence[ir.Scalar] | None | Aggregate expressions. These can be any scalar-producing expression, including aggregation functions like sum or literal values like ibis.literal(1). |
() |
| by | Sequence[ir.Value] | None | Grouping expressions. | () |
| having | Sequence[ir.BooleanValue] | None | Post-aggregation filters. The shape requirements are the same metrics, but the output type for having is boolean. ::: {.callout-warning} ## Expressions like x is None return bool and will not generate a SQL comparison to NULL ::: |
() |
| kwargs | ir.Value | Named aggregate expressions | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Table | An aggregate table expression |
Examples
>>> t.aggregate(
... by=["fruit"],
... total_cost=_.price.sum(),
... avg_cost=_.price.mean(),
... having=_.price.sum() < 0.5,
... )alias
alias(alias)Create a table expression with a specific name alias.
This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.
.alias will create a temporary view
.alias creates a temporary view in the database.
This side effect will be removed in a future version of ibis and is not part of the public API.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| alias | str | Name of the child expression | required |
Returns
| Name | Type | Description |
|---|---|---|
| Table | An table expression |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> expr = t.alias("pingüinos").sql('SELECT * FROM "pingüinos" LIMIT 5')
>>> expr # quartodoc: +SKIPas_scalar
as_scalar()Inform ibis that the table expression should be treated as a scalar.
Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.
Returns
| Name | Type | Description |
|---|---|---|
| Scalar | A scalar subquery |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.select("island").as_scalar())
>>> from_that_island.species.value_counts().order_by("species")as_table
as_table()Promote the expression to a table.
This method is a no-op for table expressions.
Returns
| Name | Type | Description |
|---|---|---|
| Table | A table expression |
Examples
>>> import xorq.api as xo
>>> t = xo.table(dict(a="int"), name="t")
>>> s = t.as_table()
>>> t is sasof_join
asof_join(
left,
right,
on,
predicates=(),
tolerance=None,
*,
lname='',
rname='{name}_right',
)Perform an “as-of” join between left and right.
Similar to a left join except that the match is done on nearest key rather than equal keys.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| left | Table | Table expression | required |
| right | Table | Table expression | required |
| on | str | ir.BooleanColumn | Closest match inequality condition | required |
| predicates | str | ir.Column | Sequence[str | ir.Column] | Additional join predicates | () |
| tolerance | str | ir.IntervalScalar | None | Amount of time to look behind when joining | None |
| lname | str | A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). |
'' |
| rname | str | A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). |
'{name}_right' |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table expression |
bind
bind(*args, **kwargs)Bind column values to a table expression.
This method handles the binding of every kind of column-like value that Ibis handles, including strings, integers, deferred expressions and selectors, to a table expression.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| args | Any | Column-like values to bind. | () |
| kwargs | Any | Column-like values to bind, with names. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| tuple[Value, …] | A tuple of bound values |
cache
cache(cache=None)Cache the results of a computation to improve performance on subsequent executions. This method allows you to cache the results of a computation either in memory, on disk using Parquet files, or in a database table. The caching strategy and cache location are determined by the cache parameter.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| cache | CacheStorage | The cache strategy to use for caching. Can be one of: - ParquetCache: Caches results as Parquet files on disk - SourceCache: Caches results in the source database - ParquetSnapshotCache: Creates a snapshot of data in Parquet format - SourceSnapshotCache: Creates a snapshot in the source database If None, uses the default cache configuration. | None |
Returns
| Name | Type | Description |
|---|---|---|
| Expr | A new expression that represents the cached computation. |
Notes
The cache method supports two main strategies: 1. ModificationTimeStrategy: Tracks changes based on modification time 2. SnapshotStrategy: Creates point-in-time snapshots of the data
Each strategy can be combined with either Parquet or database cache.
Examples
Using ParquetCache:
>>> import xorq.api as xo
>>> from xorq.caching import ParquetCache
>>> from pathlib import Path
>>> pg = xo.postgres.connect_examples()
>>> con = xo.connect()
>>> cache = ParquetCache.from_kwargs(source=con, relative_path=Path.cwd())
>>> alltypes = pg.table("functional_alltypes")
>>> cached = (alltypes
... .select(alltypes.smallint_col, alltypes.int_col, alltypes.float_col)
... .cache(cache=cache))Using SourceCache with PostgreSQL:
>>> from xorq.caching import SourceCache
>>> from xorq.api import _
>>> ddb = xo.duckdb.connect()
>>> path = xo.config.options.pins.get_path("batting")
>>> right = (ddb.read_parquet(path, table_name="batting")
... .filter(_.yearID == 2014)
... .pipe(con.register, table_name="ddb-batting"))
>>> left = (pg.table("batting")
... .filter(_.yearID == 2015)
... .pipe(con.register, table_name="pg-batting"))
>>> # Cache the joined result
>>> expr = left.join(right, "playerID").cache(SourceCache.from_kwargs(source=pg))Using cache with filtering:
>>> cached = alltypes.cache(cache=cache)
>>> expr = cached.filter([
... cached.float_col > 0,
... cached.smallint_col > 4,
... cached.int_col < cached.float_col * 2
... ])See Also
ParquetCache : Storage implementation for Parquet files SourceCache : Storage implementation for database tables ModificationTimeStrategy : Strategy for tracking changes by modification time SnapshotStrategy : Strategy for creating data snapshots
Notes
- The cache is identified by a unique key based on the computation and strategy
- Cache invalidation is handled automatically based on the chosen strategy
- Cross-source caching (e.g., from PostgreSQL to DuckDB) is supported
- Cache locations can be configured globally through xorq.config.options
cast
cast(schema)Cast the columns of a table.
Similar to pandas.DataFrame.astype.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| schema | SchemaLike | Mapping, schema or iterable of pairs to use for casting | required |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Cast table |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.schema()>>> cols = ["body_mass_g", "bill_length_mm"]
>>> t[cols].head()Columns not present in the input schema will be passed through unchanged
>>> t.columns>>> expr = t.cast({"body_mass_g": "float64", "bill_length_mm": "int"})
>>> expr.select(*cols).head()Columns that are in the input schema but not in the table raise an error
>>> t.cast({"foo": "string"}) compile
compile(limit=None, params=None, pretty=False)Compile to an execution target.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| limit | int | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. |
None |
| params | Mapping[ir.Value, Any] | None | Mapping of scalar parameter expressions to value | None |
| pretty | bool | In case of SQL backends, return a pretty formatted SQL query. | False |
count
count(where=None)Compute the number of rows in the table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| where | ir.BooleanValue | None | Optional boolean expression to filter rows when counting. | None |
Returns
| Name | Type | Description |
|---|---|---|
| IntegerScalar | Number of rows in the table |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a": ["foo", "bar", "baz"]})
>>> t>>> t.count()>>> t.count(t.a != "foo")>>> type(t.count())cross_join
cross_join(left, right, *rest, lname='', rname='{name}_right')Compute the cross join of a sequence of tables.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| left | Table | Left table | required |
| right | Table | Right table | required |
| rest | Table | Additional tables to cross join | () |
| lname | str | A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). |
'' |
| rname | str | A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). |
'{name}_right' |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Cross join of left, right and rest |
Examples
>>> import xorq.api as xo
>>> import xorq.expr.selectors as s
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.count()>>> agg = t.drop("year").agg(s.across(s.numeric(), _.mean()))
>>> expr = t.cross_join(agg)
>>> expr>>> expr.columns>>> expr.count()describe
describe(quantile=(0.25, 0.5, 0.75))Return summary information about a table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| quantile | Sequence[ir.NumericValue | float] | The quantiles to compute for numerical columns. Defaults to (0.25, 0.5, 0.75). | (0.25, 0.5, 0.75) |
Returns
| Name | Type | Description |
|---|---|---|
| Table | A table containing summary information about the columns of self. |
Notes
This function computes summary statistics for each column in the table. For numerical columns, it computes statistics such as minimum, maximum, mean, standard deviation, and quantiles. For string columns, it computes the mode and the number of unique values.
Examples
>>> import xorq.api as xo
>>> import xorq.expr.selectors as s
>>> xo.options.interactive = True
>>> p = xo.examples.penguins.fetch(deferred=False)
>>> p.describe()>>> p.select(s.of_type("numeric")).describe()>>> p.select(s.of_type("string")).describe()difference
difference(table, *rest, distinct=True)Compute the set difference of multiple table expressions.
The input tables must have identical schemas.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table | Table | A table expression | required |
| *rest | Table | Additional table expressions | () |
| distinct | bool | Only diff distinct rows not occurring in the calling table | True |
See Also
Returns
| Name | Type | Description |
|---|---|---|
| Table | The rows present in self that are not present in tables. |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a": [1, 2]})
>>> t1>>> t2 = xo.memtable({"a": [2, 3]})
>>> t2>>> t1.difference(t2)distinct
distinct(on=None, keep='first')Return a Table with duplicate rows removed.
Similar to pandas.DataFrame.drop_duplicates().
keep='last'
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| on | str | Iterable[str] | s.Selector | None | Only consider certain columns for identifying duplicates. By default, deduplicate all of the columns. | None |
| keep | Literal['first', 'last'] | None | Determines which duplicates to keep. - "first": Drop duplicates except for the first occurrence. - "last": Drop duplicates except for the last occurrence. - None: Drop all duplicates |
'first' |
Examples
>>> import xorq.api as xo
>>> import xorq.examples as ex
>>> import xorq.expr.selectors as s
>>> xo.options.interactive = True
>>> t = ex.penguins.fetch()
>>> tCompute the distinct rows of a subset of columns
>>> t[["species", "island"]].distinct().order_by(s.all())Drop all duplicate rows except the first
>>> t.distinct(on=["species", "island"], keep="first").order_by(s.all())Drop all duplicate rows except the last
>>> t.distinct(on=["species", "island"], keep="last").order_by(s.all())Drop all duplicated rows
>>> expr = t.distinct(on=["species", "island", "year", "bill_length_mm"], keep=None)
>>> expr.count()>>> t.count()You can pass selectors to on
>>> t.distinct(on=~s.numeric())The only valid values of keep are "first", "last" and .
>>> t.distinct(on="species", keep="second") drop
drop(*fields)Remove fields from a table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fields | str | Selector | Fields to drop. Strings and selectors are accepted. | () |
Returns
| Name | Type | Description |
|---|---|---|
| Table | A table with all columns matching fields removed. |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> tDrop one or more columns
>>> t.drop("species").head()>>> t.drop("species", "bill_length_mm").head()Drop with selectors, mix and match
>>> import xorq.expr.selectors as s
>>> t.drop("species", s.startswith("bill_")).head()drop_null
drop_null(subset=None, how='any')Remove rows with null values from the table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| subset | Sequence[str] | str | None | Columns names to consider when dropping nulls. By default all columns are considered. | None |
| how | Literal['any', 'all'] | Determine whether a row is removed if there is at least one null value in the row ('any'), or if all row values are null ('all'). |
'any' |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table expression |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t>>> t.count()>>> t.drop_null(["bill_length_mm", "body_mass_g"]).count()>>> t.drop_null(how="all").count() # no rows where all columns are nulldropna
dropna(subset=None, how='any')Deprecated - use drop_null instead.
equals
equals(other)Return whether this expression is structurally equivalent to other.
If you want to produce an equality expression, use == syntax.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| other | Another expression | required |
Examples
>>> import xorq.api as xo
>>> t1 = xo.table(dict(a="int"), name="t")
>>> t2 = xo.table(dict(a="int"), name="t")
>>> t1.equals(t2)>>> v = xo.table(dict(a="string"), name="v")
>>> t1.equals(v)execute
execute(**kwargs)Execute an expression against its backend if one exists.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| kwargs | Any | Keyword arguments | {} |
Examples
>>> import xorq.api as xo
>>> t = xo.examples.penguins.fetch()
>>> t.execute()Scalar parameters can be supplied dynamically during execution.
>>> species = xo.param("string")
>>> expr = t.filter(t.species == species).order_by(t.bill_length_mm)
>>> expr.execute(limit=3, params={species: "Gentoo"})fill_null
fill_null(replacements)Fill null values in a table expression.
fill_null API
For example, different library versions may impact whether a given backend promotes integer replacement values to floats.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| replacements | ir.Scalar | Mapping[str, ir.Scalar] | Value with which to fill nulls. If replacements is a mapping, the keys are column names that map to their replacement value. If passed as a scalar all columns are filled with that value. |
required |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table expression |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.sex>>> t.fill_null({"sex": "unrecorded"}).sexfillna
fillna(replacements)Deprecated - use fill_null instead.
filter
filter(*predicates)Select rows from table based on predicates.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| predicates | ir.BooleanValue | Sequence[ir.BooleanValue] | IfAnyAll | Boolean value expressions used to select rows in table. |
() |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Filtered table expression |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t>>> t.filter([t.species == "Adelie", t.body_mass_g > 3500]).sex.value_counts().drop_null(
... "sex"
... ).order_by("sex")get_name
get_name()Return the fully qualified name of the table.
group_by
group_by(*by, **key_exprs)Create a grouped table expression.
Similar to SQL’s GROUP BY statement, or pandas .groupby() method.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| by | str | ir.Value | Iterable[str] | Iterable[ir.Value] | None | Grouping expressions | () |
| key_exprs | str | ir.Value | Iterable[str] | Iterable[ir.Value] | Named grouping expressions | {} |
Returns
| Name | Type | Description |
|---|---|---|
| GroupedTable | A grouped table expression |
Examples
>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.memtable(
... {
... "fruit": ["apple", "apple", "banana", "orange"],
... "price": [0.5, 0.5, 0.25, 0.33],
... }
... )
>>> t>>> t.group_by("fruit").agg(total_cost=_.price.sum(), avg_cost=_.price.mean()).order_by(
... "fruit"
... )has_name
has_name()Check whether this expression has an explicit name.
head
head(n=5)Select the first n rows of a table.
order_by.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| n | int | Number of rows to include | 5 |
Returns
| Name | Type | Description |
|---|---|---|
| Table | self limited to n rows |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t>>> t.head(2)See Also
info
info()Return summary information about a table.
Returns
| Name | Type | Description |
|---|---|---|
| Table | Summary of self |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.info()intersect
intersect(table, *rest, distinct=True)Compute the set intersection of multiple table expressions.
The input tables must have identical schemas.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table | Table | A table expression | required |
| *rest | Table | Additional table expressions | () |
| distinct | bool | Only return distinct rows | True |
Returns
| Name | Type | Description |
|---|---|---|
| Table | A new table containing the intersection of all input tables. |
See Also
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a": [1, 2]})
>>> t1>>> t2 = xo.memtable({"a": [2, 3]})
>>> t2>>> t1.intersect(t2)into_backend
into_backend(con, name=None)Converts the Expr to a table in the given backend con with an optional table name name.
The table is backed by a PyArrow RecordBatchReader, the RecordBatchReader is teed so it can safely be reaused without spilling to disk.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| con | The backend where the table should be created | required | |
| name | The name of the table | None |
Examples
>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> ls_con = xo.connect()
>>> pg_con = xo.postgres.connect_examples()
>>> t = pg_con.table("batting").into_backend(ls_con, "ls_batting")
>>> expr = (
... t.join(t, "playerID")
... .order_by("playerID", "yearID")
... .limit(15)
... .select(player_id="playerID", year_id="yearID_right")
... )
>>> exprjoin
join(left, right, predicates=(), how='inner', *, lname='', rname='{name}_right')Perform a join between two tables.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| left | Table | Left table to join | required |
| right | Table | Right table to join | required |
| predicates | str | Sequence[str | ir.BooleanColumn | Literal[True] | Literal[False] | tuple[str | ir.Column | ir.Deferred, str | ir.Column | ir.Deferred]] | Condition(s) to join on. See examples for details. | () |
| how | JoinKind | Join method, e.g. "inner" or "left". |
'inner' |
| lname | str | A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). |
'' |
| rname | str | A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). |
'{name}_right' |
Examples
>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> movies = xo.examples.ml_latest_small_movies.fetch()
>>> movies.head()>>> ratings = xo.examples.ml_latest_small_ratings.fetch().drop("timestamp")
>>> ratings.head()Equality left join on the shared movieId column. Note the _right suffix added to all overlapping columns from the right table (in this case only the “movieId” column).
>>> ratings.join(movies, "movieId", how="left").head(5)Explicit equality join using the default how value of "inner". Note how there is no _right suffix added to the movieId column since this is an inner join and the movieId column is part of the join condition.
>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5)>>> tags = xo.examples.ml_latest_small_tags.fetch()
>>> tags.head()You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:
>>> tags.join(ratings, ["userId", "movieId"]).head(5).order_by("userId")To self-join a table with itself, you need to call .view() on one of the arguments so the two tables are distinct from each other.
For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key}), where each key can be
- a Column
- Deferred expression
- lambda of the form (Table) -> Column
For example, to find all movies pairings that received the same (ignoring case) tags:
>>> movie_tags = tags["movieId", "tag"]
>>> view = movie_tags.view()
>>> movie_tags.join(
... view,
... [
... movie_tags.movieId != view.movieId,
... (_["tag"].lower(), lambda t: t["tag"].lower()),
... ],
... ).head().order_by(("movieId", "movieId_right"))limit
limit(n, offset=0)Select n rows from self starting at offset.
order_by.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| n | int | None | Number of rows to include. If None, the entire table is selected starting from offset. |
required |
| offset | int | Number of rows to skip first | 0 |
Returns
| Name | Type | Description |
|---|---|---|
| Table | The first n rows of self starting at offset |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t>>> t.limit(2)You can use None with offset to slice starting from a particular row
>>> t.limit(None, offset=1)See Also
mutate
mutate(*exprs, **mutations)Add columns to a table expression.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| exprs | Sequence[ir.Expr] | None | List of named expressions to add as columns | () |
| mutations | ir.Value | Named expressions using keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table expression with additional columns |
Examples
>>> import xorq.api as xo
>>> import xorq.expr.selectors as s
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False).select("species", "year", "bill_length_mm")
>>> tAdd a new column from a per-element expression
>>> t.mutate(next_year=_.year + 1).head()Add a new column based on an aggregation. Note the automatic broadcasting.
>>> t.select("species", bill_demean=_.bill_length_mm - _.bill_length_mm.mean()).head()Mutate across multiple columns
>>> t.mutate(s.across(s.numeric() & ~s.cols("year"), _ - _.mean())).head()nunique
nunique(where=None)Compute the number of unique rows in the table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| where | ir.BooleanValue | None | Optional boolean expression to filter rows when counting. | None |
Returns
| Name | Type | Description |
|---|---|---|
| IntegerScalar | Number of unique rows in the table |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a": ["foo", "bar", "bar"]})
>>> t>>> t.nunique()>>> t.nunique(t.a != "foo")order_by
order_by(*by)Sort a table by one or more expressions.
Similar to pandas.DataFrame.sort_values().
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| by | str | ir.Column | s.Selector | Sequence[str] | Sequence[ir.Column] | Sequence[s.Selector] | None | Expressions to sort the table by. | () |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Sorted table |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable(
... {
... "a": [3, 2, 1, 3],
... "b": ["a", "B", "c", "D"],
... "c": [4, 6, 5, 7],
... }
... )
>>> tSort by b. Default is ascending. Note how capital letters come before lowercase
>>> t.order_by("b")Sort in descending order
>>> t.order_by(xo.desc("b"))You can also use the deferred API to get the same result
>>> from xorq.api import _
>>> t.order_by(_.b.desc())Sort by multiple columns/expressions
>>> t.order_by(["a", _.c.desc()])You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b
>>> t.order_by(_.b.lower())This means that shuffling a Table is super simple
>>> t.order_by(xo.random())Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria
>>> import xorq.expr.selectors as s
>>> penguins = xo.examples.penguins.fetch(deferred=False)
>>> penguins[["year", "island"]].value_counts().order_by(s.startswith("year"))Use the across selector to apply a specific order to multiple columns
>>> penguins[["year", "island"]].value_counts().order_by(
... s.across(s.startswith("year"), _.desc())
... )pipe
pipe(f, *args, **kwargs)Compose f with self.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| f | If the expression needs to be passed as anything other than the first argument to the function, pass a tuple with the argument name. For example, (f, ‘data’) if the function f expects a ‘data’ keyword | required | |
| args | Any | Positional arguments to f |
() |
| kwargs | Any | Keyword arguments to f |
{} |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = False
>>> t = xo.table([("a", "int64"), ("b", "string")], name="t")
>>> f = lambda a: (a + 1).name("a")
>>> g = lambda a: (a * 2).name("a")
>>> result1 = t.a.pipe(f).pipe(g)
>>> result1>>> result2 = g(f(t.a)) # equivalent to the above
>>> result1.equals(result2)Returns
| Name | Type | Description |
|---|---|---|
| Expr | Result type of passed function |
pivot_longer
pivot_longer(
col,
*,
names_to='name',
names_pattern='(.+)',
names_transform=None,
values_to='value',
values_transform=None,
)Transform a table from wider to longer.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| col | str | s.Selector | String column name or selector. | required |
| names_to | str | Iterable[str] | A string or iterable of strings indicating how to name the new pivoted columns. | 'name' |
| names_pattern | str | re.Pattern | Pattern to use to extract column names from the input. By default the entire column name is extracted. | '(.+)' |
| names_transform | Callable[[str], ir.Value] | Mapping[str, Callable[[str], ir.Value]] | None | Function or mapping of a name in names_to to a function to transform a column name to a value. |
None |
| values_to | str | Name of the pivoted value column. | 'value' |
| values_transform | Callable[[ir.Value], ir.Value] | Deferred | None | Apply a function to the value column. This can be a lambda or deferred expression. | None |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Pivoted table |
Examples
Basic usage
>>> import xorq.api as xo
>>> import xorq.expr.selectors as s
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> relig_income = xo.examples.relig_income_raw.fetch()
>>> relig_incomeHere we convert column names not matching the selector for the religion column and convert those names into values
>>> relig_income.pivot_longer(~s.cols("religion"), names_to="income", values_to="count")Similarly for a different example dataset, we convert names to values but using a different selector and the default values_to value.
>>> world_bank_pop = xo.examples.world_bank_pop_raw.fetch()
>>> world_bank_pop.head()>>> world_bank_pop.pivot_longer(s.matches(r"\d{4}"), names_to="year").head()pivot_longer has some preprocessing capabilities like stripping a prefix and applying a function to column names
>>> billboard = xo.examples.billboard.fetch()
>>> billboard>>> billboard.pivot_longer(
... s.startswith("wk"),
... names_to="week",
... names_pattern=r"wk(.+)",
... names_transform=int,
... values_to="rank",
... values_transform=_.cast("int"),
... ).drop_null("rank")You can use regular expression capture groups to extract multiple variables stored in column names
>>> who = xo.examples.who.fetch()
>>> who>>> len(who.columns)>>> who.pivot_longer(
... s.index["new_sp_m014":"newrel_f65"],
... names_to=["diagnosis", "gender", "age"],
... names_pattern="new_?(.*)_(.)(.*)",
... values_to="count",
... )names_transform is flexible, and can be:
1. A mapping of one or more names in `names_to` to callable
2. A callable that will be applied to every name
Let’s recode gender and age to numeric values using a mapping
>>> who.pivot_longer(
... s.index["new_sp_m014":"newrel_f65"],
... names_to=["diagnosis", "gender", "age"],
... names_pattern="new_?(.*)_(.)(.*)",
... names_transform=dict(
... gender={"m": 1, "f": 2}.get,
... age=dict(
... zip(
... ["014", "1524", "2534", "3544", "4554", "5564", "65"],
... range(7),
... )
... ).get,
... ),
... values_to="count",
... )The number of match groups in names_pattern must match the length of names_to
>>> who.pivot_longer(
... s.index["new_sp_m014":"newrel_f65"],
... names_to=["diagnosis", "gender", "age"],
... names_pattern="new_?(.*)_.(.*)",
... )names_transform must be a mapping or callable
>>> who.pivot_longer(
... s.index["new_sp_m014":"newrel_f65"], names_transform="upper"
... ) # quartodoc: +EXPECTED_FAILUREpivot_wider
pivot_wider(
id_cols=None,
names_from='name',
names_prefix='',
names_sep='_',
names_sort=False,
names=None,
values_from='value',
values_fill=None,
values_agg='arbitrary',
)Pivot a table to a wider format.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| id_cols | s.Selector | None | A set of columns that uniquely identify each observation. | None |
| names_from | str | Iterable[str] | s.Selector | An argument describing which column or columns to use to get the name of the output columns. | 'name' |
| names_prefix | str | String added to the start of every column name. | '' |
| names_sep | str | If names_from or values_from contains multiple columns, this argument will be used to join their values together into a single string to use as a column name. |
'_' |
| names_sort | bool | If columns are sorted. If column names are ordered by appearance. | False |
| names | Iterable[str] | None | An explicit sequence of values to look for in columns matching names_from. * When this value is None, the values will be computed from names_from. * When this value is not None, each element’s length must match the length of names_from. See examples below for more detail. |
None |
| values_from | str | Iterable[str] | s.Selector | An argument describing which column or columns to get the cell values from. | 'value' |
| values_fill | int | float | str | ir.Scalar | None | A scalar value that specifies what each value should be filled with when missing. | None |
| values_agg | str | Callable[[ir.Value], ir.Scalar] | Deferred | A function applied to the value in each cell in the output. | 'arbitrary' |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Wider pivoted table |
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = TrueBasic usage
>>> fish_encounters = ibis.examples.fish_encounters.fetch()
>>> fish_encounters>>> fish_encounters.pivot_wider(names_from="station", values_from="seen")You can do simple transpose-like operations using pivot_wider
>>> t = ibis.memtable(dict(outcome=["yes", "no"], counted=[3, 4]))
>>> t>>> t.pivot_wider(names_from="outcome", values_from="counted", names_sort=True)Fill missing pivoted values using values_fill
>>> fish_encounters.pivot_wider(
... names_from="station", values_from="seen", values_fill=0
... )Compute multiple values columns
>>> us_rent_income = ibis.examples.us_rent_income.fetch()
>>> us_rent_income>>> us_rent_income.pivot_wider(
... names_from="variable", values_from=["estimate", "moe"]
... )The column name separator can be changed using the names_sep parameter
>>> us_rent_income.pivot_wider(
... names_from="variable",
... names_sep=".",
... values_from=("estimate", "moe"),
... )Supply an alternative function to summarize values
>>> warpbreaks = ibis.examples.warpbreaks.fetch().select("wool", "tension", "breaks")
>>> warpbreaks>>> warpbreaks.pivot_wider(
... names_from="wool", values_from="breaks", values_agg="mean"
... ).select("tension", "A", "B").order_by("tension")Passing Deferred objects to values_agg is supported
>>> warpbreaks.pivot_wider(
... names_from="tension",
... values_from="breaks",
... values_agg=_.sum(),
... ).select("wool", "H", "L", "M").order_by(s.all())Use a custom aggregate function
>>> warpbreaks.pivot_wider(
... names_from="wool",
... values_from="breaks",
... values_agg=lambda col: col.std() / col.mean(),
... ).select("tension", "A", "B").order_by("tension")Generate some random data, setting the random seed for reproducibility
>>> import random
>>> random.seed(0)
>>> raw = ibis.memtable(
... [
... dict(
... product=product,
... country=country,
... year=year,
... production=random.random(),
... )
... for product in "AB"
... for country in ["AI", "EI"]
... for year in range(2000, 2015)
... ]
... )
>>> production = raw.filter(((_.product == "A") & (_.country == "AI")) | (_.product == "B"))
>>> production.order_by(s.all())Pivoting with multiple name columns
>>> production.pivot_wider(
... names_from=["product", "country"],
... values_from="production",
... )Select a subset of names. This call incurs no computation when constructing the expression.
>>> production.pivot_wider(
... names_from=["product", "country"],
... names=[("A", "AI"), ("B", "AI")],
... values_from="production",
... )Sort the new columns’ names
>>> production.pivot_wider(
... names_from=["product", "country"],
... values_from="production",
... names_sort=True,
... )preview
preview(
max_rows=None,
max_columns=None,
max_length=None,
max_string=None,
max_depth=None,
console_width=None,
)Return a subset as a Rich Table.
This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| max_rows | int | None | Maximum number of rows to display | None |
| max_columns | int | None | Maximum number of columns to display | None |
| max_length | int | None | Maximum length for pretty-printed arrays and maps | None |
| max_string | int | None | Maximum length for pretty-printed strings | None |
| max_depth | int | None | Maximum depth for nested data types | None |
| console_width | int | float | None | Width of the console in characters. If not specified, the width will be inferred from the console. | None |
Examples
>>> import xorq.api as xo
>>> t = xo.examples.penguins.fetch(deferred=False)Because the console_width is too small, only 2 columns are shown even though we specified up to 3.
>>> t.preview(
... max_rows=3,
... max_columns=3,
... max_string=8,
... console_width=30,
... )relabel
relabel(substitutions)Deprecated in favor of Table.rename.
relocate
relocate(*columns, before=None, after=None, **kwargs)Relocate columns before or after other specified columns.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| columns | str | s.Selector | Columns to relocate. Selectors are accepted. | () |
| before | str | s.Selector | None | A column name or selector to insert the new columns before. | None |
| after | str | s.Selector | None | A column name or selector. Columns in columns are relocated after the last column selected in after. |
None |
| kwargs | str | Additional column names to relocate, renaming argument values to keyword argument names. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Table | A table with the columns relocated. |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> import xorq.expr.selectors as s
>>> t = xo.memtable(dict(a=[1], b=[1], c=[1], d=["a"], e=["a"], f=["a"]))
>>> t>>> t.relocate("f")>>> t.relocate("a", after="c")>>> t.relocate("f", before="b")>>> t.relocate("a", after=s.last())Relocate allows renaming
>>> t.relocate(ff="f")You can relocate based on any predicate selector, such as of_type
>>> t.relocate(s.of_type("string"))>>> t.relocate(s.numeric(), after=s.last())When multiple columns are selected with before or after, those selected columns are moved before and after the selectors input
>>> t = xo.memtable(dict(a=[1], b=["a"], c=[1], d=["a"]))
>>> t.relocate(s.numeric(), after=s.of_type("string"))>>> t.relocate(s.numeric(), before=s.of_type("string"))When there are duplicate renames in a call to relocate, the last one is preserved
>>> t.relocate(e="d", f="d")However, if there are duplicates that are not part of a rename, the order specified in the relocate call is preserved
>>> t.relocate(
... "b",
... s.of_type("string"), # "b" is a string column, so the selector matches
... )rename
rename(method=None, /, **substitutions)Rename columns in the table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| method | str | Callable[[str], str | None] | Literal['snake_case', 'ALL_CAPS'] | Mapping[str, str] | None | An optional method for renaming columns. May be one of: - A format string to use to rename all columns, like "prefix_{name}". - A function from old name to new name. If the function returns None the old name is used. - The literal strings "snake_case" or "ALL_CAPS" to rename all columns using a snake_case or "ALL_CAPS"`` naming convention respectively. - A mapping from new name to old name. Existing columns not present in the mapping will passthrough with their original name. |None| | substitutions | [str](str) | Columns to be explicitly renamed, expressed asnew_name=old_name`keyword arguments. |` |
Returns
| Name | Type | Description |
|---|---|---|
| Table | A renamed table expression |
rowid
rowid()A unique integer per row.
Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, SQLite and DuckDB’s rowid).
For a monotonically increasing row number, see ibis.row_number.
Returns
| Name | Type | Description |
|---|---|---|
| IntegerColumn | An integer column |
sample
sample(fraction, *, method='row', seed=None)Sample a fraction of rows from a table.
Sampling is by definition a random operation. Some backends support specifying a seed for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.
In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed set.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fraction | float | The percentage of rows to include in the sample, expressed as a float between 0 and 1. | required |
| method | Literal['row', 'block'] | The sampling method to use. The default is “row”, which includes each row with a probability of fraction. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause. |
'row' |
| seed | int | None | An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| Table | The input table, with fraction of rows selected. |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"x": [1, 2, 3, 4], "y": ["a", "b", "c", "d"]})
>>> tSample approximately half the rows, with a seed specified for reproducibility.
>>> t.sample(0.5, seed=1234)schema
schema()Return the Schema for this table.
Returns
| Name | Type | Description |
|---|---|---|
| Schema | The table’s schema. |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> t.schema()select
select(*exprs, **named_exprs)Compute a new table expression using exprs and named_exprs.
Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.
For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| exprs | ir.Value | str | Iterable[ir.Value | str] | Column expression, string, or list of column expressions and strings. | () |
| named_exprs | ir.Value | str | Column expressions | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table expression |
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred=False)
>>> tSimple projection
>>> t.select("island", "bill_length_mm").head()In that simple case, you could also just use python’s indexing syntax
>>> t[["island", "bill_length_mm"]].head()Projection by zero-indexed column position
>>> t.select(t[0], t[4]).head()Projection with renaming and compute in one call
>>> t.select(next_year=t.year + 1).head()You can do the same thing with a named expression, and using the deferred API
>>> from xorq.api import _
>>> t.select((_.year + 1).name("next_year")).head()Projection with aggregation expressions
>>> t.select("island", bill_mean=t.bill_length_mm.mean()).head()Projection with a selector
>>> import xorq.expr.selectors as s
>>> t.select(s.numeric() & ~s.cols("year")).head()Projection + aggregation across multiple columns
>>> from xorq.api import _
>>> t.select(s.across(s.numeric() & ~s.cols("year"), _.mean())).head()sql
sql(query, dialect=None)Run a SQL query against a table expression.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| query | str | Query string | required |
| dialect | str | None | Optional string indicating the dialect of query. Defaults to the backend’s native dialect. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| Table | An opaque table expression |
Examples
>>> import xorq.api as xo
>>> from xorq.api import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(table_name="penguins", deferred=False)
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> exprMix and match ibis expressions with SQL queries
>>> t = xo.examples.penguins.fetch(table_name="penguins", deferred=False)
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr = expr.mutate(
... island=_.island.lower(),
... avg_bill_length=_.avg_bill_length.round(1),
... )
>>> exprBecause ibis expressions aren’t named, they aren’t visible to subsequent .sql calls. Use the alias method to assign a name to an expression.
>>> expr.alias("b").sql("SELECT * FROM b WHERE avg_bill_length > 40") See Also
to_array
to_array()View a single column table as an array.
Returns
| Name | Type | Description |
|---|---|---|
| Value | A single column view of a table |
to_csv
to_csv(path, *, params=None, **kwargs)Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The data source. A string or Path to the CSV file. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| **kwargs | Any | Additional keyword arguments passed to pyarrow.csv.CSVWriter | {} |
| https | required |
to_json
to_json(path, *, params=None, **kwargs)Write the results of expr to a NDJSON file.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
| **kwargs | Any | Additional, backend-specific keyword arguments. | {} |
| https | required |
to_pandas
to_pandas(**kwargs)Convert a table expression to a pandas DataFrame.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| kwargs | Same as keyword arguments to execute |
{} |
to_parquet
to_parquet(path, params=None, **kwargs)Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html for details.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | A string or Path where the Parquet file will be written. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| **kwargs | Any | Additional keyword arguments passed to pyarrow.parquet.ParquetWriter | {} |
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> penguins.to_parquet(tempfile.mktemp())to_pyarrow
to_pyarrow(**kwargs)Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(chunk_size=1000000, **kwargs)Execute expression and return a RecordBatchReader.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| chunk_size | int | Maximum number of rows in each returned record batch. | 1000000 |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| results | RecordBatchReader |
try_cast
try_cast(schema)Cast the columns of a table.
If the cast fails for a row, the value is returned as NULL or NaN depending on backend behavior.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| schema | SchemaLike | Mapping, schema or iterable of pairs to use for casting | required |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Cast table |
Examples
unbind
unbind()Return an expression built on UnboundTable instead of backend-specific objects.
union
union(table, *rest, distinct=False)Compute the set union of multiple table expressions.
The input tables must have identical schemas.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table | Table | A table expression | required |
| *rest | Table | Additional table expressions | () |
| distinct | bool | Only return distinct rows | False |
Returns
| Name | Type | Description |
|---|---|---|
| Table | A new table containing the union of all input tables. |
See Also
Examples
>>> import xorq.api as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a": [1, 2]})
>>> t1>>> t2 = xo.memtable({"a": [2, 3]})
>>> t2>>> t1.union(t2) # union all by default doctest: +SKIP>>> t1.union(t2, distinct=True).order_by("a")unnest
unnest(column, offset=None, keep_empty=False)Unnest an array column from a table.
When unnesting an existing column the newly unnested column replaces the existing column.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| column | Array column to unnest. | required | |
| offset | str | None | Name of the resulting index column. | None |
| keep_empty | bool | Keep empty array values as NULL in the output table, as well as existing NULL values. |
False |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table with the array column column unnested. |
See Also
Examples
unpack
unpack(*columns)Project the struct fields of each of columns into self.
Existing fields are retained in the projection.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| columns | str | String column names to project into self. |
() |
Returns
| Name | Type | Description |
|---|---|---|
| Table | The child table with struct fields of each of columns projected. |
See Also
value_counts
value_counts()Compute a frequency table of this table’s values.
Returns
| Name | Type | Description |
|---|---|---|
| Table | Frequency table of this table’s values. |
Examples
>>> import xorq.api as xo
>>> from xorq import examples
>>> xo.options.interactive = True
>>> t = examples.penguins.fetch()
>>> t.head()>>> t.year.value_counts().order_by("year")>>> t[["year", "island"]].value_counts().order_by("year", "island")view
view()Create a new table expression distinct from the current one.
Use this API for any self-referencing operations like a self-join.
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table expression |
visualize
visualize(
format='svg',
*,
label_edges=False,
verbose=False,
node_attr=None,
node_attr_getter=None,
edge_attr=None,
edge_attr_getter=None,
)Visualize an expression as a GraphViz graph in the browser.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| format | str | Image output format. These are specified by the graphviz Python library. |
'svg' |
| label_edges | bool | Show operation input names as edge labels | False |
| verbose | bool | Print the graphviz DOT code to stderr if | False |
| node_attr | Mapping[str, str] | None | Mapping of (attribute, value) pairs set for all nodes. Options are specified by the graphviz Python library. |
None |
| node_attr_getter | NodeAttributeGetter | None | Callback taking a node and returning a mapping of (attribute, value) pairs for that node. Options are specified by the graphviz Python library. |
None |
| edge_attr | Mapping[str, str] | None | Mapping of (attribute, value) pairs set for all edges. Options are specified by the graphviz Python library. |
None |
| edge_attr_getter | EdgeAttributeGetter | None | Callback taking two adjacent nodes and returning a mapping of (attribute, value) pairs for the edge between those nodes. Options are specified by the graphviz Python library. |
None |
Examples
Open the visualization of an expression in default browser:
>>> import xorq.api as xo
>>> import xorq.vendor.ibis.expr.operations as ops
>>> left = ibis.table(dict(a="int64", b="string"), name="left")
>>> right = ibis.table(dict(b="string", c="int64", d="string"), name="right")
>>> expr = left.inner_join(right, "b").select(left.a, b=right.c, c=right.d)
>>> expr.visualize(
... format="svg",
... label_edges=True,
... node_attr={"fontname": "Roboto Mono", "fontsize": "10"},
... node_attr_getter=lambda node: isinstance(node, ops.Field) and {"shape": "oval"},
... edge_attr={"fontsize": "8"},
... edge_attr_getter=lambda u, v: isinstance(u, ops.Field) and {"color": "red"},
... ) # quartodoc: +SKIPRaises
| Name | Type | Description |
|---|---|---|
| ImportError | If graphviz is not installed. |