An immutable and lazy dataframe.
You will not create Table objects directly. Instead, you will create one
Methods
aggregate
Aggregate a table with a given set of reductions grouping by by
.
alias
Create a table expression with a specific name alias
.
as_scalar
Inform ibis that the table expression should be treated as a scalar.
as_table
Promote the expression to a table.
asof_join
Perform an “as-of” join between left
and right
.
bind
Bind column values to a table expression.
cache
Cache the results of a computation to improve performance on subsequent executions.
cast
Cast the columns of a table.
compile
Compile to an execution target.
count
Compute the number of rows in the table.
cross_join
Compute the cross join of a sequence of tables.
describe
Return summary information about a table.
difference
Compute the set difference of multiple table expressions.
distinct
Return a Table with duplicate rows removed.
drop
Remove fields from a table.
drop_null
Remove rows with null values from the table.
dropna
Deprecated - use drop_null
instead.
equals
Return whether this expression is structurally equivalent to other
.
execute
Execute an expression against its backend if one exists.
fill_null
Fill null values in a table expression.
fillna
Deprecated - use fill_null
instead.
filter
Select rows from table
based on predicates
.
get_name
Return the fully qualified name of the table.
group_by
Create a grouped table expression.
has_name
Check whether this expression has an explicit name.
head
Select the first n
rows of a table.
info
Return summary information about a table.
intersect
Compute the set intersection of multiple table expressions.
into_backend
Converts the Expr to a table in the given backend con
with an optional table name name
.
join
Perform a join between two tables.
limit
Select n
rows from self
starting at offset
.
mutate
Add columns to a table expression.
nunique
Compute the number of unique rows in the table.
order_by
Sort a table by one or more expressions.
pipe
Compose f
with self
.
pivot_longer
Transform a table from wider to longer.
pivot_wider
Pivot a table to a wider format.
preview
Return a subset as a Rich Table.
relabel
Deprecated in favor of Table.rename
.
relocate
Relocate columns
before or after other specified columns.
rename
Rename columns in the table.
rowid
A unique integer per row.
sample
Sample a fraction of rows from a table.
schema
Return the Schema for this table.
select
Compute a new table expression using exprs
and named_exprs
.
sql
Run a SQL query against a table expression.
to_array
View a single column table as an array.
to_csv
Write the results of executing the given expression to a CSV file.
to_json
Write the results of expr
to a NDJSON file.
to_pandas
Convert a table expression to a pandas DataFrame.
to_parquet
Write the results of executing the given expression to a parquet file.
to_pyarrow
Execute expression and return results in as a pyarrow table.
to_pyarrow_batches
Execute expression and return a RecordBatchReader.
try_cast
Cast the columns of a table.
unbind
Return an expression built on UnboundTable
instead of backend-specific objects.
union
Compute the set union of multiple table expressions.
unnest
Unnest an array column
from a table.
unpack
Project the struct fields of each of columns
into self
.
value_counts
Compute a frequency table of this table’s values.
view
Create a new table expression distinct from the current one.
visualize
Visualize an expression as a GraphViz graph in the browser.
aggregate
aggregate(metrics= (), by= (), having= (), ** kwargs)
Aggregate a table with a given set of reductions grouping by by
.
Parameters
metrics
Sequence [ir .Scalar ] | None
Aggregate expressions. These can be any scalar-producing expression, including aggregation functions like sum
or literal values like ibis.literal(1)
.
()
by
Sequence [ir .Value ] | None
Grouping expressions.
()
having
Sequence [ir .BooleanValue ] | None
Post-aggregation filters. The shape requirements are the same metrics
, but the output type for having
is boolean
. ::: {.callout-warning} ## Expressions like x is None
return bool
and will not generate a SQL comparison to NULL
:::
()
kwargs
ir .Value
Named aggregate expressions
{}
Returns
Table
An aggregate table expression
Examples
>>> import xorq as xo
>>> from xorq import _
>>> xo.options.interactive = True
>>> t = xo.memtable(
... {
... "fruit" : ["apple" , "apple" , "banana" , "orange" ],
... "price" : [0.5 , 0.5 , 0.25 , 0.33 ],
... }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit ┃ price ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────┤
│ apple │ 0.50 │
│ apple │ 0.50 │
│ banana │ 0.25 │
│ orange │ 0.33 │
└────────┴─────────┘
>>> t.aggregate(
... by= ["fruit" ],
... total_cost= _.price.sum (),
... avg_cost= _.price.mean(),
... having= _.price.sum () < 0.5 ,
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit ┃ total_cost ┃ avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├────────┼────────────┼──────────┤
│ banana │ 0.25 │ 0.25 │
│ orange │ 0.33 │ 0.33 │
└────────┴────────────┴──────────┘
alias
Create a table expression with a specific name alias
.
This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql
method.
.alias
creates a temporary view in the database.
This side effect will be removed in a future version of ibis and is not part of the public API .
Parameters
alias
str
Name of the child expression
required
Returns
Table
An table expression
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> expr = t.alias("pingüinos" ).sql('SELECT * FROM "pingüinos" LIMIT 5' )
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
as_scalar
Inform ibis that the table expression should be treated as a scalar.
Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> heavy_gentoo = t.filter (t.species == "Gentoo" , t.body_mass_g > 6200 )
>>> from_that_island = t.filter (t.island == heavy_gentoo.select("island" ).as_scalar())
>>> from_that_island.species.value_counts().order_by("species" )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string │ int64 │
├─────────┼───────────────┤
│ Adelie │ 44 │
│ Gentoo │ 124 │
└─────────┴───────────────┘
as_table
Promote the expression to a table.
This method is a no-op for table expressions.
Examples
>>> import xorq as xo
>>> t = xo.table(dict (a= "int" ), name= "t" )
>>> s = t.as_table()
>>> t is s
asof_join
asof_join(
left,
right,
on,
predicates= (),
tolerance= None ,
* ,
lname= '' ,
rname= ' {name} _right' ,
)
Perform an “as-of” join between left
and right
.
Similar to a left join except that the match is done on nearest key rather than equal keys.
Parameters
left
Table
Table expression
required
right
Table
Table expression
required
on
str | ir .BooleanColumn
Closest match inequality condition
required
predicates
str | ir .Column | Sequence [str | ir .Column ]
Additional join predicates
()
tolerance
str | ir .IntervalScalar | None
Amount of time to look behind when joining
None
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
bind
Bind column values to a table expression.
This method handles the binding of every kind of column-like value that Ibis handles, including strings, integers, deferred expressions and selectors, to a table expression.
Parameters
args
Any
Column-like values to bind.
()
kwargs
Any
Column-like values to bind, with names.
{}
cache
Cache the results of a computation to improve performance on subsequent executions. This method allows you to cache the results of a computation either in memory, on disk using Parquet files, or in a database table. The caching strategy and storage location are determined by the storage parameter.
Parameters
storage
CacheStorage
The storage strategy to use for caching. Can be one of: - ParquetStorage: Caches results as Parquet files on disk - SourceStorage: Caches results in the source database - ParquetSnapshotStorage: Creates a snapshot of data in Parquet format - SourceSnapshotStorage: Creates a snapshot in the source database If None, uses the default storage configuration.
None
Returns
Expr
A new expression that represents the cached computation.
Notes
The cache method supports two main strategies: 1. ModificationTimeStrategy: Tracks changes based on modification time 2. SnapshotStrategy: Creates point-in-time snapshots of the data
Each strategy can be combined with either Parquet or database storage.
Examples
Using ParquetStorage:
>>> import xorq as xo
>>> from xorq.caching import ParquetStorage
>>> from pathlib import Path
>>> pg = xo.postgres.connect_examples()
>>> con = xo.connect ()
>>> storage = ParquetStorage(source= con, relative_path= Path.cwd())
>>> alltypes = pg.table("functional_alltypes" )
>>> cached = (alltypes
... .select(alltypes.smallint_col, alltypes.int_col, alltypes.float_col)
... .cache(storage= storage))
Using SourceStorage with PostgreSQL:
>>> from xorq.caching import SourceStorage
>>> from xorq import _
>>> ddb = xo.duckdb.connect ()
>>> path = xo.config.options.pins.get_path("batting" )
>>> right = (ddb.read_parquet(path, table_name= "batting" )
... .filter (_.yearID == 2014 )
... .pipe(con.register, table_name= "ddb-batting" ))
>>> left = (pg.table("batting" )
... .filter (_.yearID == 2015 )
... .pipe(con.register, table_name= "pg-batting" ))
>>> # Cache the joined result
>>> expr = left.join(right, "playerID" ).cache(SourceStorage(source= pg))
Using cache with filtering:
>>> cached = alltypes.cache(storage= storage)
>>> expr = cached.filter ([
... cached.float_col > 0 ,
... cached.smallint_col > 4 ,
... cached.int_col < cached.float_col * 2
... ])
See Also
ParquetStorage : Storage implementation for Parquet files SourceStorage : Storage implementation for database tables ModificationTimeStrategy : Strategy for tracking changes by modification time SnapshotStrategy : Strategy for creating data snapshots
Notes
The cache is identified by a unique key based on the computation and strategy
Cache invalidation is handled automatically based on the chosen strategy
Cross-source caching (e.g., from PostgreSQL to DuckDB) is supported
Cache locations can be configured globally through xorq.config.options
cast
Cast the columns of a table.
Similar to pandas.DataFrame.astype
.
Parameters
schema
SchemaLike
Mapping, schema or iterable of pairs to use for casting
required
Examples
>>> import xorq as xo
>>> import xorq.selectors as s
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t.schema()
ibis.Schema {
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm float64
body_mass_g float64
sex string
year int64
}
>>> cols = ["body_mass_g" , "bill_length_mm" ]
>>> t[cols].head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g ┃ bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │
├─────────────┼────────────────┤
│ 3750.0 │ 39.1 │
│ 3800.0 │ 39.5 │
│ 3250.0 │ 40.3 │
│ NULL │ NULL │
│ 3450.0 │ 36.7 │
└─────────────┴────────────────┘
Columns not present in the input schema will be passed through unchanged
['species',
'island',
'bill_length_mm',
'bill_depth_mm',
'flipper_length_mm',
'body_mass_g',
'sex',
'year']
>>> expr = t.cast({"body_mass_g" : "float64" , "bill_length_mm" : "int" })
>>> expr.select(* cols).head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g ┃ bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ float64 │ int64 │
├─────────────┼────────────────┤
│ 3750.0 │ 39 │
│ 3800.0 │ 39 │
│ 3250.0 │ 40 │
│ NULL │ NULL │
│ 3450.0 │ 36 │
└─────────────┴────────────────┘
Columns that are in the input schema
but not in the table raise an error
>>> t.cast({"foo" : "string" })
XorqError: Cast schema has fields that are not in the table: ['foo']
compile
compile (limit= None , params= None , pretty= False )
Compile to an execution target.
Parameters
limit
int | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
None
params
Mapping [ir .Value , Any ] | None
Mapping of scalar parameter expressions to value
None
pretty
bool
In case of SQL backends, return a pretty formatted SQL query.
False
count
Compute the number of rows in the table.
Parameters
where
ir .BooleanValue | None
Optional boolean expression to filter rows when counting.
None
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a" : ["foo" , "bar" , "baz" ]})
>>> t
┏━━━━━━━━┓
┃ a ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo │
│ bar │
│ baz │
└────────┘
>>> t.count(t.a != "foo" )
xorq.vendor.ibis.expr.types.numeric.IntegerScalar
cross_join
cross_join(left, right, * rest, lname= '' , rname= ' {name} _right' )
Compute the cross join of a sequence of tables.
Parameters
left
Table
Left table
required
right
Table
Right table
required
rest
Table
Additional tables to cross join
()
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
Returns
Table
Cross join of left
, right
and rest
Examples
>>> import xorq as xo
>>> import xorq.selectors as s
>>> from xorq import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t.count()
>>> agg = t.drop("year" ).agg(s.across(s.numeric(), _.mean()))
>>> expr = t.cross_join(agg)
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ bill_length_mm_right ┃ bill_depth_mm_right ┃ flipper_length_mm_right ┃ body_mass_g_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │ float64 │ float64 │ float64 │ float64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼──────────────────────┼─────────────────────┼─────────────────────────┼───────────────────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴──────────────────────┴─────────────────────┴─────────────────────────┴───────────────────┘
['species',
'island',
'bill_length_mm',
'bill_depth_mm',
'flipper_length_mm',
'body_mass_g',
'sex',
'year',
'bill_length_mm_right',
'bill_depth_mm_right',
'flipper_length_mm_right',
'body_mass_g_right']
describe
describe(quantile= (0.25 , 0.5 , 0.75 ))
Return summary information about a table.
Parameters
quantile
Sequence [ir .NumericValue | float ]
The quantiles to compute for numerical columns. Defaults to (0.25, 0.5, 0.75).
(0.25, 0.5, 0.75)
Returns
Table
A table containing summary information about the columns of self.
Notes
This function computes summary statistics for each column in the table. For numerical columns, it computes statistics such as minimum, maximum, mean, standard deviation, and quantiles. For string columns, it computes the mode and the number of unique values.
Examples
>>> import xorq as xo
>>> import xorq.selectors as s
>>> xo.options.interactive = True
>>> p = xo.examples.penguins.fetch(deferred= False )
>>> p.describe()
Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'Mode' operation is not defined")
Expression repr follows:
r0 := DatabaseTable: penguins
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm float64
body_mass_g float64
sex string
year int64
r1 := Aggregate[r0]
metrics:
name: 'species'
pos: 0
type: 'string'
count: Count(IsNull(r0.species))
nulls: Sum(IsNull(r0.species))
unique: CountDistinct(r0.species)
mode: Mode(r0.species)
mean: Cast(None, to=float64)
std: Cast(None, to=float64)
min: Cast(None, to=float64)
p25: Cast(None, to=float64)
p50: Cast(None, to=float64)
p75: Cast(None, to=float64)
max: Cast(None, to=float64)
r2 := Aggregate[r0]
metrics:
name: 'island'
pos: 1
type: 'string'
count: Count(IsNull(r0.island))
nulls: Sum(IsNull(r0.island))
unique: CountDistinct(r0.island)
mode: Mode(r0.island)
mean: Cast(None, to=float64)
std: Cast(None, to=float64)
min: Cast(None, to=float64)
p25: Cast(None, to=float64)
p50: Cast(None, to=float64)
p75: Cast(None, to=float64)
max: Cast(None, to=float64)
r3 := Aggregate[r0]
metrics:
name: 'bill_length_mm'
pos: 2
type: 'float64'
count: Count(IsNull(r0.bill_length_mm))
nulls: Sum(IsNull(r0.bill_length_mm))
unique: CountDistinct(r0.bill_length_mm)
mode: Cast(None, to=string)
mean: Mean(r0.bill_length_mm)
std: StandardDev(r0.bill_length_mm, how='sample')
min: Min(r0.bill_length_mm)
p25: Quantile(r0.bill_length_mm, quantile=0.25)
p50: Quantile(r0.bill_length_mm, quantile=0.5)
p75: Quantile(r0.bill_length_mm, quantile=0.75)
max: Max(r0.bill_length_mm)
r4 := Aggregate[r0]
metrics:
name: 'bill_depth_mm'
pos: 3
type: 'float64'
count: Count(IsNull(r0.bill_depth_mm))
nulls: Sum(IsNull(r0.bill_depth_mm))
unique: CountDistinct(r0.bill_depth_mm)
mode: Cast(None, to=string)
mean: Mean(r0.bill_depth_mm)
std: StandardDev(r0.bill_depth_mm, how='sample')
min: Min(r0.bill_depth_mm)
p25: Quantile(r0.bill_depth_mm, quantile=0.25)
p50: Quantile(r0.bill_depth_mm, quantile=0.5)
p75: Quantile(r0.bill_depth_mm, quantile=0.75)
max: Max(r0.bill_depth_mm)
r5 := Aggregate[r0]
metrics:
name: 'flipper_length_mm'
pos: 4
type: 'float64'
count: Count(IsNull(r0.flipper_length_mm))
nulls: Sum(IsNull(r0.flipper_length_mm))
unique: CountDistinct(r0.flipper_length_mm)
mode: Cast(None, to=string)
mean: Mean(r0.flipper_length_mm)
std: StandardDev(r0.flipper_length_mm, how='sample')
min: Min(r0.flipper_length_mm)
p25: Quantile(r0.flipper_length_mm, quantile=0.25)
p50: Quantile(r0.flipper_length_mm, quantile=0.5)
p75: Quantile(r0.flipper_length_mm, quantile=0.75)
max: Max(r0.flipper_length_mm)
r6 := Aggregate[r0]
metrics:
name: 'body_mass_g'
pos: 5
type: 'float64'
count: Count(IsNull(r0.body_mass_g))
nulls: Sum(IsNull(r0.body_mass_g))
unique: CountDistinct(r0.body_mass_g)
mode: Cast(None, to=string)
mean: Mean(r0.body_mass_g)
std: StandardDev(r0.body_mass_g, how='sample')
min: Min(r0.body_mass_g)
p25: Quantile(r0.body_mass_g, quantile=0.25)
p50: Quantile(r0.body_mass_g, quantile=0.5)
p75: Quantile(r0.body_mass_g, quantile=0.75)
max: Max(r0.body_mass_g)
r7 := Aggregate[r0]
metrics:
name: 'sex'
pos: 6
type: 'string'
count: Count(IsNull(r0.sex))
nulls: Sum(IsNull(r0.sex))
unique: CountDistinct(r0.sex)
mode: Mode(r0.sex)
mean: Cast(None, to=float64)
std: Cast(None, to=float64)
min: Cast(None, to=float64)
p25: Cast(None, to=float64)
p50: Cast(None, to=float64)
p75: Cast(None, to=float64)
max: Cast(None, to=float64)
r8 := Aggregate[r0]
metrics:
name: 'year'
pos: 7
type: 'int64'
count: Count(IsNull(r0.year))
nulls: Sum(IsNull(r0.year))
unique: CountDistinct(r0.year)
mode: Cast(None, to=string)
mean: Mean(r0.year)
std: StandardDev(r0.year, how='sample')
min: Cast(Min(r0.year), to=float64)
p25: Quantile(r0.year, quantile=0.25)
p50: Quantile(r0.year, quantile=0.5)
p75: Quantile(r0.year, quantile=0.75)
max: Cast(Max(r0.year), to=float64)
r9 := Union[r1, r2, distinct=False]
r10 := Union[r3, r4, distinct=False]
r11 := Union[r5, r6, distinct=False]
r12 := Union[r7, r8, distinct=False]
r13 := Union[r9, r10, distinct=False]
r14 := Union[r11, r12, distinct=False]
Union[r13, r14, distinct=False]
>>> p.select(s.of_type("numeric" )).describe()
Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'Quantile' operation is not defined")
Expression repr follows:
r0 := DatabaseTable: penguins
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm float64
body_mass_g float64
sex string
year int64
r1 := Project[r0]
bill_length_mm: r0.bill_length_mm
bill_depth_mm: r0.bill_depth_mm
flipper_length_mm: r0.flipper_length_mm
body_mass_g: r0.body_mass_g
year: r0.year
r2 := Aggregate[r1]
metrics:
name: 'flipper_length_mm'
pos: 2
type: 'float64'
count: Count(IsNull(r1.flipper_length_mm))
nulls: Sum(IsNull(r1.flipper_length_mm))
unique: CountDistinct(r1.flipper_length_mm)
mode: Cast(None, to=string)
mean: Mean(r1.flipper_length_mm)
std: StandardDev(r1.flipper_length_mm, how='sample')
min: Min(r1.flipper_length_mm)
p25: Quantile(r1.flipper_length_mm, quantile=0.25)
p50: Quantile(r1.flipper_length_mm, quantile=0.5)
p75: Quantile(r1.flipper_length_mm, quantile=0.75)
max: Max(r1.flipper_length_mm)
r3 := Aggregate[r1]
metrics:
name: 'body_mass_g'
pos: 3
type: 'float64'
count: Count(IsNull(r1.body_mass_g))
nulls: Sum(IsNull(r1.body_mass_g))
unique: CountDistinct(r1.body_mass_g)
mode: Cast(None, to=string)
mean: Mean(r1.body_mass_g)
std: StandardDev(r1.body_mass_g, how='sample')
min: Min(r1.body_mass_g)
p25: Quantile(r1.body_mass_g, quantile=0.25)
p50: Quantile(r1.body_mass_g, quantile=0.5)
p75: Quantile(r1.body_mass_g, quantile=0.75)
max: Max(r1.body_mass_g)
r4 := Aggregate[r1]
metrics:
name: 'year'
pos: 4
type: 'int64'
count: Count(IsNull(r1.year))
nulls: Sum(IsNull(r1.year))
unique: CountDistinct(r1.year)
mode: Cast(None, to=string)
mean: Mean(r1.year)
std: StandardDev(r1.year, how='sample')
min: Cast(Min(r1.year), to=float64)
p25: Quantile(r1.year, quantile=0.25)
p50: Quantile(r1.year, quantile=0.5)
p75: Quantile(r1.year, quantile=0.75)
max: Cast(Max(r1.year), to=float64)
r5 := Aggregate[r1]
metrics:
name: 'bill_length_mm'
pos: 0
type: 'float64'
count: Count(IsNull(r1.bill_length_mm))
nulls: Sum(IsNull(r1.bill_length_mm))
unique: CountDistinct(r1.bill_length_mm)
mode: Cast(None, to=string)
mean: Mean(r1.bill_length_mm)
std: StandardDev(r1.bill_length_mm, how='sample')
min: Min(r1.bill_length_mm)
p25: Quantile(r1.bill_length_mm, quantile=0.25)
p50: Quantile(r1.bill_length_mm, quantile=0.5)
p75: Quantile(r1.bill_length_mm, quantile=0.75)
max: Max(r1.bill_length_mm)
r6 := Aggregate[r1]
metrics:
name: 'bill_depth_mm'
pos: 1
type: 'float64'
count: Count(IsNull(r1.bill_depth_mm))
nulls: Sum(IsNull(r1.bill_depth_mm))
unique: CountDistinct(r1.bill_depth_mm)
mode: Cast(None, to=string)
mean: Mean(r1.bill_depth_mm)
std: StandardDev(r1.bill_depth_mm, how='sample')
min: Min(r1.bill_depth_mm)
p25: Quantile(r1.bill_depth_mm, quantile=0.25)
p50: Quantile(r1.bill_depth_mm, quantile=0.5)
p75: Quantile(r1.bill_depth_mm, quantile=0.75)
max: Max(r1.bill_depth_mm)
r7 := Union[r2, r3, distinct=False]
r8 := Union[r5, r6, distinct=False]
r9 := Union[r4, r8, distinct=False]
r10 := Union[r7, r9, distinct=False]
DropColumns[r10]
columns_to_drop:
frozenset({'mode'})
schema:
name string
pos int16
type string
count int64
nulls int64
unique int64
mean float64
std float64
min float64
p25 float64
p50 float64
p75 float64
max float64
>>> p.select(s.of_type("string" )).describe()
Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'Mode' operation is not defined")
Expression repr follows:
r0 := DatabaseTable: penguins
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm float64
body_mass_g float64
sex string
year int64
r1 := Project[r0]
species: r0.species
island: r0.island
sex: r0.sex
r2 := Aggregate[r1]
metrics:
name: 'sex'
pos: 2
type: 'string'
count: Count(IsNull(r1.sex))
nulls: Sum(IsNull(r1.sex))
unique: CountDistinct(r1.sex)
mode: Mode(r1.sex)
mean: Cast(None, to=float64)
std: Cast(None, to=float64)
min: Cast(None, to=float64)
p25: Cast(None, to=float64)
p50: Cast(None, to=float64)
p75: Cast(None, to=float64)
max: Cast(None, to=float64)
r3 := Aggregate[r1]
metrics:
name: 'species'
pos: 0
type: 'string'
count: Count(IsNull(r1.species))
nulls: Sum(IsNull(r1.species))
unique: CountDistinct(r1.species)
mode: Mode(r1.species)
mean: Cast(None, to=float64)
std: Cast(None, to=float64)
min: Cast(None, to=float64)
p25: Cast(None, to=float64)
p50: Cast(None, to=float64)
p75: Cast(None, to=float64)
max: Cast(None, to=float64)
r4 := Aggregate[r1]
metrics:
name: 'island'
pos: 1
type: 'string'
count: Count(IsNull(r1.island))
nulls: Sum(IsNull(r1.island))
unique: CountDistinct(r1.island)
mode: Mode(r1.island)
mean: Cast(None, to=float64)
std: Cast(None, to=float64)
min: Cast(None, to=float64)
p25: Cast(None, to=float64)
p50: Cast(None, to=float64)
p75: Cast(None, to=float64)
max: Cast(None, to=float64)
r5 := Union[r3, r4, distinct=False]
r6 := Union[r2, r5, distinct=False]
Project[r6]
name: r6.name
pos: r6.pos
type: r6.type
count: r6.count
nulls: r6.nulls
unique: r6.unique
mode: r6.mode
difference
difference(table, * rest, distinct= True )
Compute the set difference of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only diff distinct rows not occurring in the calling table
True
Returns
Table
The rows present in self
that are not present in tables
.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = xo.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
└───────┘
distinct
distinct(on= None , keep= 'first' )
Return a Table with duplicate rows removed.
Similar to pandas.DataFrame.drop_duplicates()
.
Parameters
on
str | Iterable [str ] | s .Selector | None
Only consider certain columns for identifying duplicates. By default, deduplicate all of the columns.
None
keep
Literal ['first', 'last'] | None
Determines which duplicates to keep. - "first"
: Drop duplicates except for the first occurrence. - "last"
: Drop duplicates except for the last occurrence. - None
: Drop all duplicates
'first'
Examples
>>> import xorq as xo
>>> import xorq.examples as ex
>>> import xorq.selectors as s
>>> xo.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Compute the distinct rows of a subset of columns
>>> t[["species" , "island" ]].distinct().order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ island ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ string │
├───────────┼───────────┤
│ Adelie │ Biscoe │
│ Adelie │ Dream │
│ Adelie │ Torgersen │
│ Chinstrap │ Dream │
│ Gentoo │ Biscoe │
└───────────┴───────────┘
Drop all duplicate rows except the first
>>> t.distinct(on= ["species" , "island" ], keep= "first" ).order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 37.8 │ 18.3 │ 174.0 │ 3400.0 │ female │ 2007 │
│ Adelie │ Dream │ 39.5 │ 16.7 │ 178.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Chinstrap │ Dream │ 46.5 │ 17.9 │ 192.0 │ 3500.0 │ female │ 2007 │
│ Gentoo │ Biscoe │ 46.1 │ 13.2 │ 211.0 │ 4500.0 │ female │ 2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop all duplicate rows except the last
>>> t.distinct(on= ["species" , "island" ], keep= "last" ).order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 42.7 │ 18.3 │ 196.0 │ 4075.0 │ male │ 2009 │
│ Adelie │ Dream │ 41.5 │ 18.5 │ 201.0 │ 4000.0 │ male │ 2009 │
│ Adelie │ Torgersen │ 43.1 │ 19.2 │ 197.0 │ 3500.0 │ male │ 2009 │
│ Chinstrap │ Dream │ 50.2 │ 18.7 │ 198.0 │ 3775.0 │ female │ 2009 │
│ Gentoo │ Biscoe │ 49.9 │ 16.1 │ 213.0 │ 5400.0 │ male │ 2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop all duplicated rows
>>> expr = t.distinct(on= ["species" , "island" , "year" , "bill_length_mm" ], keep= None )
>>> expr.count()
You can pass selectors
to on
>>> t.distinct(on=~ s.numeric())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Chinstrap │ Dream │ 46.5 │ 17.9 │ 192.0 │ 3500.0 │ female │ 2007 │
│ Chinstrap │ Dream │ 50.0 │ 19.5 │ 196.0 │ 3900.0 │ male │ 2007 │
│ Adelie │ Biscoe │ 37.8 │ 18.3 │ 174.0 │ 3400.0 │ female │ 2007 │
│ Adelie │ Biscoe │ 37.7 │ 18.7 │ 180.0 │ 3600.0 │ male │ 2007 │
│ Adelie │ Dream │ 39.5 │ 16.7 │ 178.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Dream │ 37.2 │ 18.1 │ 178.0 │ 3900.0 │ male │ 2007 │
│ Adelie │ Dream │ 37.5 │ 18.9 │ 179.0 │ 2975.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
The only valid values of keep
are "first"
, "last"
and .
>>> t.distinct(on= "species" , keep= "second" )
XorqError: Invalid value for `keep`: 'second', must be 'first', 'last' or None
drop
Remove fields from a table.
Parameters
fields
str | Selector
Fields to drop. Strings and selectors are accepted.
()
Returns
Table
A table with all columns matching fields
removed.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop one or more columns
>>> t.drop("species" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
└───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.drop("species" , "bill_length_mm" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ float64 │ float64 │ float64 │ string │ int64 │
├───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Torgersen │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Torgersen │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
└───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop with selectors, mix and match
>>> import xorq.selectors as s
>>> t.drop("species" , s.startswith("bill_" )).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ float64 │ float64 │ string │ int64 │
├───────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 181.0 │ 3750.0 │ male │ 2007 │
│ Torgersen │ 186.0 │ 3800.0 │ female │ 2007 │
│ Torgersen │ 195.0 │ 3250.0 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 193.0 │ 3450.0 │ female │ 2007 │
└───────────┴───────────────────┴─────────────┴────────┴───────┘
drop_null
drop_null(subset= None , how= 'any' )
Remove rows with null values from the table.
Parameters
subset
Sequence [str ] | str | None
Columns names to consider when dropping nulls. By default all columns are considered.
None
how
Literal ['any', 'all']
Determine whether a row is removed if there is at least one null value in the row ('any'
), or if all row values are null ('all'
).
'any'
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.drop_null(["bill_length_mm" , "body_mass_g" ]).count()
>>> t.drop_null(how= "all" ).count() # no rows where all columns are null
dropna
dropna(subset= None , how= 'any' )
Deprecated - use drop_null
instead.
equals
Return whether this expression is structurally equivalent to other
.
If you want to produce an equality expression, use ==
syntax.
Parameters
other
Another expression
required
Examples
>>> import xorq as xo
>>> t1 = xo.table(dict (a= "int" ), name= "t" )
>>> t2 = xo.table(dict (a= "int" ), name= "t" )
>>> t1.equals(t2)
>>> v = xo.table(dict (a= "string" ), name= "v" )
>>> t1.equals(v)
execute
Execute an expression against its backend if one exists.
Parameters
kwargs
Any
Keyword arguments
{}
Examples
>>> import xorq as xo
>>> t = xo.examples.penguins.fetch()
>>> t.execute()
0
Adelie
Torgersen
39.1
18.7
181.0
3750.0
male
2007
1
Adelie
Torgersen
39.5
17.4
186.0
3800.0
female
2007
2
Adelie
Torgersen
40.3
18.0
195.0
3250.0
female
2007
3
Adelie
Torgersen
NaN
NaN
NaN
NaN
None
2007
4
Adelie
Torgersen
36.7
19.3
193.0
3450.0
female
2007
...
...
...
...
...
...
...
...
...
339
Chinstrap
Dream
55.8
19.8
207.0
4000.0
male
2009
340
Chinstrap
Dream
43.5
18.1
202.0
3400.0
female
2009
341
Chinstrap
Dream
49.6
18.2
193.0
3775.0
male
2009
342
Chinstrap
Dream
50.8
19.0
210.0
4100.0
male
2009
343
Chinstrap
Dream
50.2
18.7
198.0
3775.0
female
2009
344 rows × 8 columns
Scalar parameters can be supplied dynamically during execution.
>>> species = xo.param("string" )
>>> expr = t.filter (t.species == species).order_by(t.bill_length_mm)
>>> expr.execute(limit= 3 , params= {species: "Gentoo" })
0
Gentoo
Biscoe
40.9
13.7
214.0
4650.0
female
2007
1
Gentoo
Biscoe
41.7
14.7
210.0
4700.0
female
2009
2
Gentoo
Biscoe
42.0
13.5
210.0
4150.0
female
2007
fill_null
Fill null values in a table expression.
For example, different library versions may impact whether a given backend promotes integer replacement values to floats.
Parameters
replacements
ir .Scalar | Mapping [str , ir .Scalar ]
Value with which to fill nulls. If replacements
is a mapping, the keys are column names that map to their replacement value. If passed as a scalar all columns are filled with that value.
required
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t.sex
┏━━━━━━━━┓
┃ sex ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ male │
│ female │
│ female │
│ NULL │
│ female │
│ male │
│ female │
│ male │
│ NULL │
│ NULL │
│ … │
└────────┘
>>> t.fill_null({"sex" : "unrecorded" }).sex
┏━━━━━━━━━━━━┓
┃ sex ┃
┡━━━━━━━━━━━━┩
│ string │
├────────────┤
│ male │
│ female │
│ female │
│ unrecorded │
│ female │
│ male │
│ female │
│ male │
│ unrecorded │
│ unrecorded │
│ … │
└────────────┘
fillna
Deprecated - use fill_null
instead.
filter
Select rows from table
based on predicates
.
Returns
Table
Filtered table expression
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.filter ([t.species == "Adelie" , t.body_mass_g > 3500 ]).sex.value_counts().drop_null(
... "sex"
... ).order_by("sex" )
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex ┃ sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64 │
├────────┼───────────┤
│ female │ 22 │
│ male │ 68 │
└────────┴───────────┘
get_name
Return the fully qualified name of the table.
group_by
group_by(* by, ** key_exprs)
Create a grouped table expression.
Similar to SQL’s GROUP BY statement, or pandas .groupby() method.
Examples
>>> import xorq as xo
>>> from xorq import _
>>> xo.options.interactive = True
>>> t = xo.memtable(
... {
... "fruit" : ["apple" , "apple" , "banana" , "orange" ],
... "price" : [0.5 , 0.5 , 0.25 , 0.33 ],
... }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit ┃ price ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────┤
│ apple │ 0.50 │
│ apple │ 0.50 │
│ banana │ 0.25 │
│ orange │ 0.33 │
└────────┴─────────┘
>>> t.group_by("fruit" ).agg(total_cost= _.price.sum (), avg_cost= _.price.mean()).order_by(
... "fruit"
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit ┃ total_cost ┃ avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├────────┼────────────┼──────────┤
│ apple │ 1.00 │ 0.50 │
│ banana │ 0.25 │ 0.25 │
│ orange │ 0.33 │ 0.33 │
└────────┴────────────┴──────────┘
has_name
Check whether this expression has an explicit name.
head
Select the first n
rows of a table.
Parameters
n
int
Number of rows to include
5
Returns
Table
self
limited to n
rows
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a" : [1 , 1 , 2 ], "b" : ["c" , "a" , "a" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
└───────┴────────┘
info
Return summary information about a table.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t.info()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ name ┃ type ┃ nullable ┃ nulls ┃ non_nulls ┃ null_frac ┃ pos ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ boolean │ int64 │ int64 │ float64 │ int16 │
├───────────────────┼─────────┼──────────┼───────┼───────────┼───────────┼───────┤
│ species │ string │ True │ 0 │ 344 │ 0.000000 │ 0 │
│ island │ string │ True │ 0 │ 344 │ 0.000000 │ 1 │
│ bill_length_mm │ float64 │ True │ 2 │ 342 │ 0.005814 │ 2 │
│ bill_depth_mm │ float64 │ True │ 2 │ 342 │ 0.005814 │ 3 │
│ flipper_length_mm │ float64 │ True │ 2 │ 342 │ 0.005814 │ 4 │
│ body_mass_g │ float64 │ True │ 2 │ 342 │ 0.005814 │ 5 │
│ sex │ string │ True │ 11 │ 333 │ 0.031977 │ 6 │
│ year │ int64 │ True │ 0 │ 344 │ 0.000000 │ 7 │
└───────────────────┴─────────┴──────────┴───────┴───────────┴───────────┴───────┘
intersect
intersect(table, * rest, distinct= True )
Compute the set intersection of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only return distinct rows
True
Returns
Table
A new table containing the intersection of all input tables.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = xo.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
└───────┘
into_backend
into_backend(con, name= None )
Converts the Expr to a table in the given backend con
with an optional table name name
.
The table is backed by a PyArrow RecordBatchReader, the RecordBatchReader is teed so it can safely be reaused without spilling to disk.
Parameters
con
The backend where the table should be created
required
name
The name of the table
None
Examples
>>> import xorq as xo
>>> from xorq import _
>>> xo.options.interactive = True
>>> ls_con = xo.connect ()
>>> pg_con = xo.postgres.connect_examples()
>>> t = pg_con.table("batting" ).into_backend(ls_con, "ls_batting" )
>>> expr = (
... t.join(t, "playerID" )
... .order_by("playerID" , "yearID" )
... .limit(15 )
... .select(player_id= "playerID" , year_id= "yearID_right" )
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━┓
┃ player_id ┃ year_id ┃
┡━━━━━━━━━━━╇━━━━━━━━━┩
│ string │ int64 │
├───────────┼─────────┤
│ aardsda01 │ 2015 │
│ aardsda01 │ 2004 │
│ aardsda01 │ 2006 │
│ aardsda01 │ 2009 │
│ aardsda01 │ 2008 │
│ aardsda01 │ 2007 │
│ aardsda01 │ 2012 │
│ aardsda01 │ 2013 │
│ aardsda01 │ 2010 │
│ aardsda01 │ 2008 │
│ … │ … │
└───────────┴─────────┘
join
join(left, right, predicates= (), how= 'inner' , * , lname= '' , rname= ' {name} _right' )
Perform a join between two tables.
Parameters
left
Table
Left table to join
required
right
Table
Right table to join
required
predicates
str | Sequence [str | ir .BooleanColumn | Literal [True] | Literal [False] | tuple [str | ir .Column | ir .Deferred , str | ir .Column | ir .Deferred ]]
Condition(s) to join on. See examples for details.
()
how
JoinKind
Join method, e.g. "inner"
or "left"
.
'inner'
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
Examples
>>> import xorq as xo
>>> from xorq import _
>>> xo.options.interactive = True
>>> movies = xo.examples.ml_latest_small_movies.fetch()
>>> movies.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ title ┃ genres ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ string │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 2 │ Jumanji (1995) │ Adventure|Children|Fantasy │
│ 3 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 4 │ Waiting to Exhale (1995) │ Comedy|Drama|Romance │
│ 5 │ Father of the Bride Part II (1995) │ Comedy │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘
>>> ratings = xo.examples.ml_latest_small_ratings.fetch().drop("timestamp" )
>>> ratings.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ float64 │
├────────┼─────────┼─────────┤
│ 1 │ 1 │ 4.0 │
│ 1 │ 3 │ 4.0 │
│ 1 │ 6 │ 4.0 │
│ 1 │ 47 │ 5.0 │
│ 1 │ 50 │ 5.0 │
└────────┴─────────┴─────────┘
Equality left join on the shared movieId
column. Note the _right
suffix added to all overlapping columns from the right table (in this case only the “movieId” column).
>>> ratings.join(movies, "movieId" , how= "left" ).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃ movieId_right ┃ title ┃ genres ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ float64 │ int64 │ string │ string │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ 1 │ 4.0 │ 1 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 1 │ 3 │ 4.0 │ 3 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 1 │ 6 │ 4.0 │ 6 │ Heat (1995) │ Action|Crime|Thriller │
│ 1 │ 47 │ 5.0 │ 47 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller │
│ 1 │ 50 │ 5.0 │ 50 │ Usual Suspects, The (1995) │ Crime|Mystery|Thriller │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘
Explicit equality join using the default how
value of "inner"
. Note how there is no _right
suffix added to the movieId
column since this is an inner join and the movieId
column is part of the join condition.
>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃ title ┃ genres ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ float64 │ string │ string │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ 1 │ 4.0 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 1 │ 3 │ 4.0 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 1 │ 6 │ 4.0 │ Heat (1995) │ Action|Crime|Thriller │
│ 1 │ 47 │ 5.0 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller │
│ 1 │ 50 │ 5.0 │ Usual Suspects, The (1995) │ Crime|Mystery|Thriller │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘
>>> tags = xo.examples.ml_latest_small_tags.fetch()
>>> tags.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag ┃ timestamp ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ int64 │ string │ int64 │
├────────┼─────────┼─────────────────┼────────────┤
│ 2 │ 60756 │ funny │ 1445714994 │
│ 2 │ 60756 │ Highly quotable │ 1445714996 │
│ 2 │ 60756 │ will ferrell │ 1445714992 │
│ 2 │ 89774 │ Boxing story │ 1445715207 │
│ 2 │ 89774 │ MMA │ 1445715200 │
└────────┴─────────┴─────────────────┴────────────┘
You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:
>>> tags.join(ratings, ["userId" , "movieId" ]).head(5 ).order_by("userId" )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag ┃ timestamp ┃ rating ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ string │ int64 │ float64 │
├────────┼─────────┼────────────────┼────────────┼─────────┤
│ 62 │ 2 │ Robin Williams │ 1528843907 │ 4.0 │
│ 62 │ 110 │ sword fight │ 1528152535 │ 4.5 │
│ 62 │ 410 │ gothic │ 1525636609 │ 4.5 │
│ 62 │ 2023 │ mafia │ 1525636733 │ 5.0 │
│ 62 │ 2124 │ quirky │ 1525636846 │ 5.0 │
└────────┴─────────┴────────────────┴────────────┴─────────┘
To self-join a table with itself, you need to call .view()
on one of the arguments so the two tables are distinct from each other.
For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key})
, where each key can be
a Column
Deferred expression
lambda of the form (Table) -> Column
For example, to find all movies pairings that received the same (ignoring case) tags:
>>> movie_tags = tags["movieId" , "tag" ]
>>> view = movie_tags.view()
>>> movie_tags.join(
... view,
... [
... movie_tags.movieId != view.movieId,
... (_.tag.lower(), lambda t: t.tag.lower()),
... ],
... ).head().order_by(("movieId" , "movieId_right" ))
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ tag ┃ movieId_right ┃ tag_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │ string │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│ 60756 │ funny │ 1732 │ funny │
│ 60756 │ Highly quotable │ 1732 │ Highly quotable │
│ 89774 │ Tom Hardy │ 139385 │ tom hardy │
│ 106782 │ drugs │ 1732 │ drugs │
│ 106782 │ Leonardo DiCaprio │ 5989 │ Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘
limit
Select n
rows from self
starting at offset
.
Parameters
n
int | None
Number of rows to include. If None
, the entire table is selected starting from offset
.
required
offset
int
Number of rows to skip first
0
Returns
Table
The first n
rows of self
starting at offset
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a" : [1 , 1 , 2 ], "b" : ["c" , "a" , "a" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
└───────┴────────┘
You can use None
with offset
to slice starting from a particular row
>>> t.limit(None , offset= 1 )
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
mutate
mutate(* exprs, ** mutations)
Add columns to a table expression.
Parameters
exprs
Sequence [ir .Expr ] | None
List of named expressions to add as columns
()
mutations
ir .Value
Named expressions using keyword arguments
{}
Returns
Table
Table expression with additional columns
Examples
>>> import xorq as xo
>>> import xorq.selectors as s
>>> from xorq import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False ).select("species" , "year" , "bill_length_mm" )
>>> t
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ int64 │ float64 │
├─────────┼───────┼────────────────┤
│ Adelie │ 2007 │ 39.1 │
│ Adelie │ 2007 │ 39.5 │
│ Adelie │ 2007 │ 40.3 │
│ Adelie │ 2007 │ NULL │
│ Adelie │ 2007 │ 36.7 │
│ Adelie │ 2007 │ 39.3 │
│ Adelie │ 2007 │ 38.9 │
│ Adelie │ 2007 │ 39.2 │
│ Adelie │ 2007 │ 34.1 │
│ Adelie │ 2007 │ 42.0 │
│ … │ … │ … │
└─────────┴───────┴────────────────┘
Add a new column from a per-element expression
>>> t.mutate(next_year= _.year + 1 ).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃ next_year ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64 │ float64 │ int64 │
├─────────┼───────┼────────────────┼───────────┤
│ Adelie │ 2007 │ 39.1 │ 2008 │
│ Adelie │ 2007 │ 39.5 │ 2008 │
│ Adelie │ 2007 │ 40.3 │ 2008 │
│ Adelie │ 2007 │ NULL │ 2008 │
│ Adelie │ 2007 │ 36.7 │ 2008 │
└─────────┴───────┴────────────────┴───────────┘
Add a new column based on an aggregation. Note the automatic broadcasting.
>>> t.select("species" , bill_demean= _.bill_length_mm - _.bill_length_mm.mean()).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ species ┃ bill_demean ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ float64 │
├─────────┼─────────────┤
│ Adelie │ -4.82193 │
│ Adelie │ -4.42193 │
│ Adelie │ -3.62193 │
│ Adelie │ NULL │
│ Adelie │ -7.22193 │
└─────────┴─────────────┘
Mutate across multiple columns
>>> t.mutate(s.across(s.numeric() & ~ s.cols("year" ), _ - _.mean())).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ int64 │ float64 │
├─────────┼───────┼────────────────┤
│ Adelie │ 2007 │ -4.82193 │
│ Adelie │ 2007 │ -4.42193 │
│ Adelie │ 2007 │ -3.62193 │
│ Adelie │ 2007 │ NULL │
│ Adelie │ 2007 │ -7.22193 │
└─────────┴───────┴────────────────┘
nunique
Compute the number of unique rows in the table.
Parameters
where
ir .BooleanValue | None
Optional boolean expression to filter rows when counting.
None
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"a" : ["foo" , "bar" , "bar" ]})
>>> t
┏━━━━━━━━┓
┃ a ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo │
│ bar │
│ bar │
└────────┘
Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'CountDistinctStar' operation is not defined")
Expression repr follows:
r0 := InMemoryTable
data:
PandasDataFrameProxy:
a
0 foo
1 bar
2 bar
CountDistinctStar(ibis_pandas_memtable_37jhql67pbfoxccwrfgwgyigte): CountDistinctStar(r0)
>>> t.nunique(t.a != "foo" )
Translation to backend failed
Error message: OperationNotDefinedError("Compilation rule for 'CountDistinctStar' operation is not defined")
Expression repr follows:
r0 := InMemoryTable
data:
PandasDataFrameProxy:
a
0 foo
1 bar
2 bar
CountDistinctStar(ibis_pandas_memtable_37jhql67pbfoxccwrfgwgyigte, NotEquals(a, 'foo')): CountDistinctStar(r0, wher
order_by
Sort a table by one or more expressions.
Similar to pandas.DataFrame.sort_values()
.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.memtable(
... {
... "a" : [3 , 2 , 1 , 3 ],
... "b" : ["a" , "B" , "c" , "D" ],
... "c" : [4 , 6 , 5 , 7 ],
... }
... )
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
Sort by b. Default is ascending. Note how capital letters come before lowercase
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 3 │ a │ 4 │
│ 1 │ c │ 5 │
└───────┴────────┴───────┘
Sort in descending order
>>> t.order_by(xo.desc("b" ))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
└───────┴────────┴───────┘
You can also use the deferred API to get the same result
>>> from xorq import _
>>> t.order_by(_.b.desc())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
└───────┴────────┴───────┘
Sort by multiple columns/expressions
>>> t.order_by(["a" , _.c.desc()])
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 3 │ a │ 4 │
└───────┴────────┴───────┘
You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b
>>> t.order_by(_.b.lower())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
This means that shuffling a Table is super simple
>>> t.order_by(xo.random())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
└───────┴────────┴───────┘
Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria
>>> import xorq.selectors as s
>>> penguins = xo.examples.penguins.fetch(deferred= False )
>>> penguins[["year" , "island" ]].value_counts().order_by(s.startswith("year" ))
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ island ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼───────────┼───────────────────┤
│ 2007 │ Torgersen │ 20 │
│ 2007 │ Biscoe │ 44 │
│ 2007 │ Dream │ 46 │
│ 2008 │ Torgersen │ 16 │
│ 2008 │ Dream │ 34 │
│ 2008 │ Biscoe │ 64 │
│ 2009 │ Torgersen │ 16 │
│ 2009 │ Dream │ 44 │
│ 2009 │ Biscoe │ 60 │
└───────┴───────────┴───────────────────┘
Use the across
selector to apply a specific order to multiple columns
>>> penguins[["year" , "island" ]].value_counts().order_by(
... s.across(s.startswith("year" ), _.desc())
... )
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ island ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼───────────┼───────────────────┤
│ 2009 │ Biscoe │ 60 │
│ 2009 │ Dream │ 44 │
│ 2009 │ Torgersen │ 16 │
│ 2008 │ Biscoe │ 64 │
│ 2008 │ Dream │ 34 │
│ 2008 │ Torgersen │ 16 │
│ 2007 │ Dream │ 46 │
│ 2007 │ Biscoe │ 44 │
│ 2007 │ Torgersen │ 20 │
└───────┴───────────┴───────────────────┘
pipe
Compose f
with self
.
Parameters
f
If the expression needs to be passed as anything other than the first argument to the function, pass a tuple with the argument name. For example, (f, ‘data’) if the function f expects a ‘data’ keyword
required
args
Any
Positional arguments to f
()
kwargs
Any
Keyword arguments to f
{}
Examples
>>> import xorq as xo
>>> xo.options.interactive = False
>>> t = xo.table([("a" , "int64" ), ("b" , "string" )], name= "t" )
>>> f = lambda a: (a + 1 ).name("a" )
>>> g = lambda a: (a * 2 ).name("a" )
>>> result1 = t.a.pipe(f).pipe(g)
>>> result1
r0 := UnboundTable: t
a int64
b string
a: r0.a + 1 * 2
>>> result2 = g(f(t.a)) # equivalent to the above
>>> result1.equals(result2)
Returns
Expr
Result type of passed function
pivot_longer
pivot_longer(
col,
* ,
names_to= 'name' ,
names_pattern= '(.+)' ,
names_transform= None ,
values_to= 'value' ,
values_transform= None ,
)
Transform a table from wider to longer.
Parameters
col
str | s .Selector
String column name or selector.
required
names_to
str | Iterable [str ]
A string or iterable of strings indicating how to name the new pivoted columns.
'name'
names_pattern
str | re .Pattern
Pattern to use to extract column names from the input. By default the entire column name is extracted.
'(.+)'
names_transform
Callable [[str ], ir .Value ] | Mapping [str , Callable [[str ], ir .Value ]] | None
Function or mapping of a name in names_to
to a function to transform a column name to a value.
None
values_to
str
Name of the pivoted value column.
'value'
values_transform
Callable [[ir .Value ], ir .Value ] | Deferred | None
Apply a function to the value column. This can be a lambda or deferred expression.
None
Examples
Basic usage
>>> import xorq as xo
>>> import xorq.selectors as s
>>> from xorq import _
>>> xo.options.interactive = True
>>> relig_income = xo.examples.relig_income_raw.fetch()
>>> relig_income
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ religion ┃ <$10k ┃ $10-20k ┃ $20-30k ┃ $30-40k ┃ $40-50k ┃ $50-75k ┃ $75-100k ┃ $100-150k ┃ >150k ┃ Don't know/refused ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├─────────────────────────┼───────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼───────────┼───────┼────────────────────┤
│ Agnostic │ 27 │ 34 │ 60 │ 81 │ 76 │ 137 │ 122 │ 109 │ 84 │ 96 │
│ Atheist │ 12 │ 27 │ 37 │ 52 │ 35 │ 70 │ 73 │ 59 │ 74 │ 76 │
│ Buddhist │ 27 │ 21 │ 30 │ 34 │ 33 │ 58 │ 62 │ 39 │ 53 │ 54 │
│ Catholic │ 418 │ 617 │ 732 │ 670 │ 638 │ 1116 │ 949 │ 792 │ 633 │ 1489 │
│ Don’t know/refused │ 15 │ 14 │ 15 │ 11 │ 10 │ 35 │ 21 │ 17 │ 18 │ 116 │
│ Evangelical Prot │ 575 │ 869 │ 1064 │ 982 │ 881 │ 1486 │ 949 │ 723 │ 414 │ 1529 │
│ Hindu │ 1 │ 9 │ 7 │ 9 │ 11 │ 34 │ 47 │ 48 │ 54 │ 37 │
│ Historically Black Prot │ 228 │ 244 │ 236 │ 238 │ 197 │ 223 │ 131 │ 81 │ 78 │ 339 │
│ Jehovah's Witness │ 20 │ 27 │ 24 │ 24 │ 21 │ 30 │ 15 │ 11 │ 6 │ 37 │
│ Jewish │ 19 │ 19 │ 25 │ 25 │ 30 │ 95 │ 69 │ 87 │ 151 │ 162 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────────────────┴───────┴─────────┴─────────┴─────────┴─────────┴─────────┴──────────┴───────────┴───────┴────────────────────┘
Here we convert column names not matching the selector for the religion
column and convert those names into values
>>> relig_income.pivot_longer(~ s.cols("religion" ), names_to= "income" , values_to= "count" )
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ religion ┃ income ┃ count ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │
├──────────┼────────────────────┼───────┤
│ Agnostic │ <$10k │ 27 │
│ Agnostic │ $10-20k │ 34 │
│ Agnostic │ $20-30k │ 60 │
│ Agnostic │ $30-40k │ 81 │
│ Agnostic │ $40-50k │ 76 │
│ Agnostic │ $50-75k │ 137 │
│ Agnostic │ $75-100k │ 122 │
│ Agnostic │ $100-150k │ 109 │
│ Agnostic │ >150k │ 84 │
│ Agnostic │ Don't know/refused │ 96 │
│ … │ … │ … │
└──────────┴────────────────────┴───────┘
Similarly for a different example dataset, we convert names to values but using a different selector and the default values_to
value.
>>> world_bank_pop = xo.examples.world_bank_pop_raw.fetch()
>>> world_bank_pop.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ country ┃ indicator ┃ 2000 ┃ 2001 ┃ 2002 ┃ 2003 ┃ 2004 ┃ 2005 ┃ 2006 ┃ 2007 ┃ 2008 ┃ 2009 ┃ 2010 ┃ 2011 ┃ 2012 ┃ 2013 ┃ 2014 ┃ 2015 ┃ 2016 ┃ 2017 ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │
├─────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ ABW │ SP.URB.TOTL │ 4.162500e+04 │ 4.202500e+04 │ 4.219400e+04 │ 4.227700e+04 │ 4.231700e+04 │ 4.239900e+04 │ 4.255500e+04 │ 4.272900e+04 │ 4.290600e+04 │ 4.307900e+04 │ 4.320600e+04 │ 4.349300e+04 │ 4.386400e+04 │ 4.422800e+04 │ 4.458800e+04 │ 4.494300e+04 │ 4.529700e+04 │ 4.564800e+04 │
│ ABW │ SP.URB.GROW │ 1.664222e+00 │ 9.563731e-01 │ 4.013352e-01 │ 1.965172e-01 │ 9.456936e-02 │ 1.935880e-01 │ 3.672580e-01 │ 4.080490e-01 │ 4.133830e-01 │ 4.023963e-01 │ 2.943735e-01 │ 6.620631e-01 │ 8.493932e-01 │ 8.264135e-01 │ 8.106692e-01 │ 7.930256e-01 │ 7.845785e-01 │ 7.718989e-01 │
│ ABW │ SP.POP.TOTL │ 8.910100e+04 │ 9.069100e+04 │ 9.178100e+04 │ 9.270100e+04 │ 9.354000e+04 │ 9.448300e+04 │ 9.560600e+04 │ 9.678700e+04 │ 9.799600e+04 │ 9.921200e+04 │ 1.003410e+05 │ 1.012880e+05 │ 1.021120e+05 │ 1.028800e+05 │ 1.035940e+05 │ 1.042570e+05 │ 1.048740e+05 │ 1.054390e+05 │
│ ABW │ SP.POP.GROW │ 2.539234e+00 │ 1.768757e+00 │ 1.194718e+00 │ 9.973955e-01 │ 9.009892e-01 │ 1.003077e+00 │ 1.181566e+00 │ 1.227711e+00 │ 1.241397e+00 │ 1.233231e+00 │ 1.131541e+00 │ 9.393559e-01 │ 8.102306e-01 │ 7.493010e-01 │ 6.916153e-01 │ 6.379592e-01 │ 5.900625e-01 │ 5.372957e-01 │
│ AFE │ SP.URB.TOTL │ 1.155517e+08 │ 1.197755e+08 │ 1.242275e+08 │ 1.288340e+08 │ 1.336475e+08 │ 1.387456e+08 │ 1.440267e+08 │ 1.492313e+08 │ 1.553838e+08 │ 1.617762e+08 │ 1.684561e+08 │ 1.754157e+08 │ 1.825587e+08 │ 1.901087e+08 │ 1.980733e+08 │ 2.065563e+08 │ 2.150833e+08 │ 2.237321e+08 │
└─────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
>>> world_bank_pop.pivot_longer(s.matches(r"\d {4} " ), names_to= "year" ).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ country ┃ indicator ┃ year ┃ value ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ string │ string │ string │ float64 │
├─────────┼─────────────┼────────┼─────────┤
│ ABW │ SP.URB.TOTL │ 2000 │ 41625.0 │
│ ABW │ SP.URB.TOTL │ 2001 │ 42025.0 │
│ ABW │ SP.URB.TOTL │ 2002 │ 42194.0 │
│ ABW │ SP.URB.TOTL │ 2003 │ 42277.0 │
│ ABW │ SP.URB.TOTL │ 2004 │ 42317.0 │
└─────────┴─────────────┴────────┴─────────┘
pivot_longer
has some preprocessing capabilities like stripping a prefix and applying a function to column names
>>> billboard = xo.examples.billboard.fetch()
>>> billboard
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ artist ┃ track ┃ date_entered ┃ wk1 ┃ wk2 ┃ wk3 ┃ wk4 ┃ wk5 ┃ wk6 ┃ wk7 ┃ wk8 ┃ wk9 ┃ wk10 ┃ wk11 ┃ wk12 ┃ wk13 ┃ wk14 ┃ wk15 ┃ wk16 ┃ wk17 ┃ wk18 ┃ wk19 ┃ wk20 ┃ wk21 ┃ wk22 ┃ wk23 ┃ wk24 ┃ wk25 ┃ wk26 ┃ wk27 ┃ wk28 ┃ wk29 ┃ wk30 ┃ wk31 ┃ wk32 ┃ wk33 ┃ wk34 ┃ wk35 ┃ wk36 ┃ wk37 ┃ wk38 ┃ wk39 ┃ wk40 ┃ wk41 ┃ wk42 ┃ wk43 ┃ wk44 ┃ wk45 ┃ wk46 ┃ wk47 ┃ wk48 ┃ wk49 ┃ wk50 ┃ wk51 ┃ wk52 ┃ wk53 ┃ wk54 ┃ wk55 ┃ wk56 ┃ wk57 ┃ wk58 ┃ wk59 ┃ wk60 ┃ wk61 ┃ wk62 ┃ wk63 ┃ wk64 ┃ wk65 ┃ wk66 ┃ wk67 ┃ wk68 ┃ wk69 ┃ wk70 ┃ wk71 ┃ wk72 ┃ wk73 ┃ wk74 ┃ wk75 ┃ wk76 ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ string │ date │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ string │ string │ string │ string │ string │ string │ string │ string │ string │ string │ string │
├────────────────┼─────────────────────────┼──────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 87 │ 82 │ 72 │ 77 │ 87 │ 94 │ 99 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 91 │ 87 │ 92 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 3 Doors Down │ Kryptonite │ 2000-04-08 │ 81 │ 70 │ 68 │ 67 │ 66 │ 57 │ 54 │ 53 │ 51 │ 51 │ 51 │ 51 │ 47 │ 44 │ 38 │ 28 │ 22 │ 18 │ 18 │ 14 │ 12 │ 7 │ 6 │ 6 │ 6 │ 5 │ 5 │ 4 │ 4 │ 4 │ 4 │ 3 │ 3 │ 3 │ 4 │ 5 │ 5 │ 9 │ 9 │ 15 │ 14 │ 13 │ 14 │ 16 │ 17 │ 21 │ 22 │ 24 │ 28 │ 33 │ 42 │ 42 │ 49 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 3 Doors Down │ Loser │ 2000-10-21 │ 76 │ 76 │ 72 │ 69 │ 67 │ 65 │ 55 │ 59 │ 62 │ 61 │ 61 │ 59 │ 61 │ 66 │ 72 │ 76 │ 75 │ 67 │ 73 │ 70 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 504 Boyz │ Wobble Wobble │ 2000-04-15 │ 57 │ 34 │ 25 │ 17 │ 17 │ 31 │ 36 │ 49 │ 53 │ 57 │ 64 │ 70 │ 75 │ 76 │ 78 │ 85 │ 92 │ 96 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 98^0 │ Give Me Just One Nig... │ 2000-08-19 │ 51 │ 39 │ 34 │ 26 │ 26 │ 19 │ 2 │ 2 │ 3 │ 6 │ 7 │ 22 │ 29 │ 36 │ 47 │ 67 │ 66 │ 84 │ 93 │ 94 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ A*Teens │ Dancing Queen │ 2000-07-08 │ 97 │ 97 │ 96 │ 95 │ 100 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Aaliyah │ I Don't Wanna │ 2000-01-29 │ 84 │ 62 │ 51 │ 41 │ 38 │ 35 │ 35 │ 38 │ 38 │ 36 │ 37 │ 37 │ 38 │ 49 │ 61 │ 63 │ 62 │ 67 │ 83 │ 86 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Aaliyah │ Try Again │ 2000-03-18 │ 59 │ 53 │ 38 │ 28 │ 21 │ 18 │ 16 │ 14 │ 12 │ 10 │ 9 │ 8 │ 6 │ 1 │ 2 │ 2 │ 2 │ 2 │ 3 │ 4 │ 5 │ 5 │ 6 │ 9 │ 13 │ 14 │ 16 │ 23 │ 22 │ 33 │ 36 │ 43 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Adams, Yolanda │ Open My Heart │ 2000-08-26 │ 76 │ 76 │ 74 │ 69 │ 68 │ 67 │ 61 │ 58 │ 57 │ 59 │ 66 │ 68 │ 61 │ 67 │ 59 │ 63 │ 67 │ 71 │ 79 │ 89 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└────────────────┴─────────────────────────┴──────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘
>>> billboard.pivot_longer(
... s.startswith("wk" ),
... names_to= "week" ,
... names_pattern= r"wk(.+)" ,
... names_transform= int ,
... values_to= "rank" ,
... values_transform= _.cast("int" ),
... ).drop_null("rank" )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ artist ┃ track ┃ date_entered ┃ week ┃ rank ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ string │ string │ date │ int8 │ int64 │
├─────────┼─────────────────────────┼──────────────┼──────┼───────┤
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 1 │ 87 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 2 │ 82 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 3 │ 72 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 4 │ 77 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 5 │ 87 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 6 │ 94 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 7 │ 99 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 1 │ 91 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 2 │ 87 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 3 │ 92 │
│ … │ … │ … │ … │ … │
└─────────┴─────────────────────────┴──────────────┴──────┴───────┘
You can use regular expression capture groups to extract multiple variables stored in column names
>>> who = xo.examples.who.fetch()
>>> who
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ new_sp_m014 ┃ new_sp_m1524 ┃ new_sp_m2534 ┃ new_sp_m3544 ┃ new_sp_m4554 ┃ new_sp_m5564 ┃ new_sp_m65 ┃ new_sp_f014 ┃ new_sp_f1524 ┃ new_sp_f2534 ┃ new_sp_f3544 ┃ new_sp_f4554 ┃ new_sp_f5564 ┃ new_sp_f65 ┃ new_sn_m014 ┃ new_sn_m1524 ┃ new_sn_m2534 ┃ new_sn_m3544 ┃ new_sn_m4554 ┃ new_sn_m5564 ┃ new_sn_m65 ┃ new_sn_f014 ┃ new_sn_f1524 ┃ new_sn_f2534 ┃ new_sn_f3544 ┃ new_sn_f4554 ┃ new_sn_f5564 ┃ new_sn_f65 ┃ new_ep_m014 ┃ new_ep_m1524 ┃ new_ep_m2534 ┃ new_ep_m3544 ┃ new_ep_m4554 ┃ new_ep_m5564 ┃ new_ep_m65 ┃ new_ep_f014 ┃ new_ep_f1524 ┃ new_ep_f2534 ┃ new_ep_f3544 ┃ new_ep_f4554 ┃ new_ep_f5564 ┃ new_ep_f65 ┃ newrel_m014 ┃ newrel_m1524 ┃ newrel_m2534 ┃ newrel_m3544 ┃ newrel_m4554 ┃ newrel_m5564 ┃ newrel_m65 ┃ newrel_f014 ┃ newrel_f1524 ┃ newrel_f2534 ┃ newrel_f3544 ┃ newrel_f4554 ┃ newrel_f5564 ┃ newrel_f65 ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├─────────────┼────────┼────────┼───────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┤
│ Afghanistan │ AF │ AFG │ 1980 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1981 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1982 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1983 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1984 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1985 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1986 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1987 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1988 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1989 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┘
>>> who.pivot_longer(
... s.index["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_(.)(.*)" ,
... values_to= "count" ,
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ diagnosis ┃ gender ┃ age ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ string │ string │ string │ int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼────────┼───────┤
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 014 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 1524 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 2534 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 3544 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 4554 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 5564 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 65 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 014 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 1524 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 2534 │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴────────┴───────┘
names_transform
is flexible, and can be:
1. A mapping of one or more names in `names_to` to callable
2. A callable that will be applied to every name
Let’s recode gender and age to numeric values using a mapping
>>> who.pivot_longer(
... s.index["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_(.)(.*)" ,
... names_transform= dict (
... gender= {"m" : 1 , "f" : 2 }.get,
... age= dict (
... zip (
... ["014" , "1524" , "2534" , "3544" , "4554" , "5564" , "65" ],
... range (7 ),
... )
... ).get,
... ),
... values_to= "count" ,
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ diagnosis ┃ gender ┃ age ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ string │ int8 │ int8 │ int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼──────┼───────┤
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 0 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 1 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 2 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 3 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 4 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 5 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 6 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 0 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 1 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 2 │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴──────┴───────┘
The number of match groups in names_pattern
must match the length of names_to
>>> who.pivot_longer(
... s.index["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_.(.*)" ,
... )
XorqInputError: Number of match groups in `names_pattern`'new_?(.*)_.(.*)' (2 groups) doesn't match the length of `names_to` ['diagnosis', 'gender', 'age'] (length 3)
names_transform
must be a mapping or callable
>>> who.pivot_longer(
... s.index["new_sp_m014" :"newrel_f65" ], names_transform= "upper"
... ) # quartodoc: +EXPECTED_FAILURE
XorqTypeError: `names_transform` must be a mapping or callable. Got <class 'str'>
pivot_wider
pivot_wider(
id_cols= None ,
names_from= 'name' ,
names_prefix= '' ,
names_sep= '_' ,
names_sort= False ,
names= None ,
values_from= 'value' ,
values_fill= None ,
values_agg= 'arbitrary' ,
)
Pivot a table to a wider format.
Parameters
id_cols
s .Selector | None
A set of columns that uniquely identify each observation.
None
names_from
str | Iterable [str ] | s .Selector
An argument describing which column or columns to use to get the name of the output columns.
'name'
names_prefix
str
String added to the start of every column name.
''
names_sep
str
If names_from
or values_from
contains multiple columns, this argument will be used to join their values together into a single string to use as a column name.
'_'
names_sort
bool
If columns are sorted. If column names are ordered by appearance.
False
names
Iterable [str ] | None
An explicit sequence of values to look for in columns matching names_from
. * When this value is None
, the values will be computed from names_from
. * When this value is not None
, each element’s length must match the length of names_from
. See examples below for more detail.
None
values_from
str | Iterable [str ] | s .Selector
An argument describing which column or columns to get the cell values from.
'value'
values_fill
int | float | str | ir .Scalar | None
A scalar value that specifies what each value should be filled with when missing.
None
values_agg
str | Callable [[ir .Value ], ir .Scalar ] | Deferred
A function applied to the value in each cell in the output.
'arbitrary'
Returns
Table
Wider pivoted table
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
Basic usage
>>> fish_encounters = ibis.examples.fish_encounters.fetch()
>>> fish_encounters
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ fish ┃ station ┃ seen ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼─────────┼───────┤
│ 4842 │ Release │ 1 │
│ 4842 │ I80_1 │ 1 │
│ 4842 │ Lisbon │ 1 │
│ 4842 │ Rstr │ 1 │
│ 4842 │ Base_TD │ 1 │
│ 4842 │ BCE │ 1 │
│ 4842 │ BCW │ 1 │
│ 4842 │ BCE2 │ 1 │
│ 4842 │ BCW2 │ 1 │
│ 4842 │ MAE │ 1 │
│ … │ … │ … │
└───────┴─────────┴───────┘
>>> fish_encounters.pivot_wider(names_from= "station" , values_from= "seen" )
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ fish ┃ Release ┃ Rstr ┃ BCW ┃ BCE2 ┃ I80_1 ┃ MAW ┃ BCE ┃ MAE ┃ Lisbon ┃ Base_TD ┃ BCW2 ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├───────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┼─────────┼───────┤
│ 4844 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4845 │ 1 │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ NULL │ 1 │ 1 │ NULL │
│ 4849 │ 1 │ NULL │ NULL │ NULL │ 1 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 4859 │ 1 │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ NULL │ 1 │ 1 │ NULL │
│ 4861 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4842 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4847 │ 1 │ NULL │ NULL │ NULL │ 1 │ NULL │ NULL │ NULL │ 1 │ NULL │ NULL │
│ 4850 │ 1 │ 1 │ 1 │ NULL │ 1 │ NULL │ 1 │ NULL │ NULL │ 1 │ NULL │
│ 4843 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4848 │ 1 │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ NULL │ 1 │ NULL │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└───────┴─────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴────────┴─────────┴───────┘
You can do simple transpose-like operations using pivot_wider
>>> t = ibis.memtable(dict (outcome= ["yes" , "no" ], counted= [3 , 4 ]))
>>> t
┏━━━━━━━━━┳━━━━━━━━━┓
┃ outcome ┃ counted ┃
┡━━━━━━━━━╇━━━━━━━━━┩
│ string │ int64 │
├─────────┼─────────┤
│ yes │ 3 │
│ no │ 4 │
└─────────┴─────────┘
>>> t.pivot_wider(names_from= "outcome" , values_from= "counted" , names_sort= True )
┏━━━━━━━┳━━━━━━━┓
┃ no ┃ yes ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│ 4 │ 3 │
└───────┴───────┘
Fill missing pivoted values using values_fill
>>> fish_encounters.pivot_wider(
... names_from= "station" , values_from= "seen" , values_fill= 0
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish ┃ I80_1 ┃ MAW ┃ Lisbon ┃ Base_TD ┃ BCW2 ┃ Release ┃ Rstr ┃ BCW ┃ BCE2 ┃ BCE ┃ MAE ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├───────┼───────┼───────┼────────┼─────────┼───────┼─────────┼───────┼───────┼───────┼───────┼───────┤
│ 4843 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4848 │ 1 │ 0 │ 1 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │ 0 │ 0 │
│ 4865 │ 1 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4844 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4845 │ 1 │ 0 │ 1 │ 1 │ 0 │ 1 │ 1 │ 0 │ 0 │ 0 │ 0 │
│ 4849 │ 1 │ 0 │ 0 │ 0 │ 0 │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4859 │ 1 │ 0 │ 1 │ 1 │ 0 │ 1 │ 1 │ 0 │ 0 │ 0 │ 0 │
│ 4861 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4842 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4847 │ 1 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└───────┴───────┴───────┴────────┴─────────┴───────┴─────────┴───────┴───────┴───────┴───────┴───────┘
Compute multiple values columns
>>> us_rent_income = ibis.examples.us_rent_income.fetch()
>>> us_rent_income
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ geoid ┃ name ┃ variable ┃ estimate ┃ moe ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │
├────────┼────────────┼──────────┼──────────┼───────┤
│ 01 │ Alabama │ income │ 24476 │ 136 │
│ 01 │ Alabama │ rent │ 747 │ 3 │
│ 02 │ Alaska │ income │ 32940 │ 508 │
│ 02 │ Alaska │ rent │ 1200 │ 13 │
│ 04 │ Arizona │ income │ 27517 │ 148 │
│ 04 │ Arizona │ rent │ 972 │ 4 │
│ 05 │ Arkansas │ income │ 23789 │ 165 │
│ 05 │ Arkansas │ rent │ 709 │ 5 │
│ 06 │ California │ income │ 29454 │ 109 │
│ 06 │ California │ rent │ 1358 │ 3 │
│ … │ … │ … │ … │ … │
└────────┴────────────┴──────────┴──────────┴───────┘
>>> us_rent_income.pivot_wider(
... names_from= "variable" , values_from= ["estimate" , "moe" ]
... )
┏━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid ┃ name ┃ estimate_income ┃ moe_income ┃ estimate_rent ┃ moe_rent ┃
┡━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ string │ int64 │ int64 │ int64 │ int64 │
├────────┼──────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05 │ Arkansas │ 23789 │ 165 │ 709 │ 5 │
│ 12 │ Florida │ 25952 │ 70 │ 1077 │ 3 │
│ 13 │ Georgia │ 27024 │ 106 │ 927 │ 3 │
│ 19 │ Iowa │ 30002 │ 143 │ 740 │ 4 │
│ 21 │ Kentucky │ 24702 │ 159 │ 713 │ 4 │
│ 23 │ Maine │ 26841 │ 187 │ 808 │ 7 │
│ 24 │ Maryland │ 37147 │ 152 │ 1311 │ 5 │
│ 26 │ Michigan │ 26987 │ 82 │ 824 │ 3 │
│ 29 │ Missouri │ 26999 │ 113 │ 784 │ 4 │
│ 31 │ Nebraska │ 30020 │ 146 │ 773 │ 4 │
│ … │ … │ … │ … │ … │ … │
└────────┴──────────┴─────────────────┴────────────┴───────────────┴──────────┘
The column name separator can be changed using the names_sep
parameter
>>> us_rent_income.pivot_wider(
... names_from= "variable" ,
... names_sep= "." ,
... values_from= ("estimate" , "moe" ),
... )
┏━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid ┃ name ┃ estimate.income ┃ moe.income ┃ estimate.rent ┃ moe.rent ┃
┡━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ string │ int64 │ int64 │ int64 │ int64 │
├────────┼──────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05 │ Arkansas │ 23789 │ 165 │ 709 │ 5 │
│ 12 │ Florida │ 25952 │ 70 │ 1077 │ 3 │
│ 13 │ Georgia │ 27024 │ 106 │ 927 │ 3 │
│ 19 │ Iowa │ 30002 │ 143 │ 740 │ 4 │
│ 21 │ Kentucky │ 24702 │ 159 │ 713 │ 4 │
│ 23 │ Maine │ 26841 │ 187 │ 808 │ 7 │
│ 24 │ Maryland │ 37147 │ 152 │ 1311 │ 5 │
│ 26 │ Michigan │ 26987 │ 82 │ 824 │ 3 │
│ 29 │ Missouri │ 26999 │ 113 │ 784 │ 4 │
│ 31 │ Nebraska │ 30020 │ 146 │ 773 │ 4 │
│ … │ … │ … │ … │ … │ … │
└────────┴──────────┴─────────────────┴────────────┴───────────────┴──────────┘
Supply an alternative function to summarize values
>>> warpbreaks = ibis.examples.warpbreaks.fetch().select("wool" , "tension" , "breaks" )
>>> warpbreaks
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ wool ┃ tension ┃ breaks ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ string │ string │ int64 │
├────────┼─────────┼────────┤
│ A │ L │ 26 │
│ A │ L │ 30 │
│ A │ L │ 54 │
│ A │ L │ 25 │
│ A │ L │ 70 │
│ A │ L │ 52 │
│ A │ L │ 51 │
│ A │ L │ 26 │
│ A │ L │ 67 │
│ A │ M │ 18 │
│ … │ … │ … │
└────────┴─────────┴────────┘
>>> warpbreaks.pivot_wider(
... names_from= "wool" , values_from= "breaks" , values_agg= "mean"
... ).select("tension" , "A" , "B" ).order_by("tension" )
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ tension ┃ A ┃ B ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64 │ float64 │
├─────────┼───────────┼───────────┤
│ H │ 24.555556 │ 18.777778 │
│ L │ 44.555556 │ 28.222222 │
│ M │ 24.000000 │ 28.777778 │
└─────────┴───────────┴───────────┘
Passing Deferred
objects to values_agg
is supported
>>> warpbreaks.pivot_wider(
... names_from= "tension" ,
... values_from= "breaks" ,
... values_agg= _.sum (),
... ).select("wool" , "H" , "L" , "M" ).order_by(s.all ())
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ wool ┃ H ┃ L ┃ M ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ int64 │ int64 │ int64 │
├────────┼───────┼───────┼───────┤
│ A │ 221 │ 401 │ 216 │
│ B │ 169 │ 254 │ 259 │
└────────┴───────┴───────┴───────┘
Use a custom aggregate function
>>> warpbreaks.pivot_wider(
... names_from= "wool" ,
... values_from= "breaks" ,
... values_agg= lambda col: col.std() / col.mean(),
... ).select("tension" , "A" , "B" ).order_by("tension" )
┏━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ tension ┃ A ┃ B ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├─────────┼──────────┼──────────┤
│ H │ 0.418344 │ 0.260590 │
│ L │ 0.406183 │ 0.349325 │
│ M │ 0.360844 │ 0.327719 │
└─────────┴──────────┴──────────┘
Generate some random data, setting the random seed for reproducibility
>>> import random
>>> random.seed(0 )
>>> raw = ibis.memtable(
... [
... dict (
... product= product,
... country= country,
... year= year,
... production= random.random(),
... )
... for product in "AB"
... for country in ["AI" , "EI" ]
... for year in range (2000 , 2015 )
... ]
... )
>>> production = raw.filter (((_.product == "A" ) & (_.country == "AI" )) | (_.product == "B" ))
>>> production.order_by(s.all ())
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ product ┃ country ┃ year ┃ production ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ string │ string │ int64 │ float64 │
├─────────┼─────────┼───────┼────────────┤
│ A │ AI │ 2000 │ 0.844422 │
│ A │ AI │ 2001 │ 0.757954 │
│ A │ AI │ 2002 │ 0.420572 │
│ A │ AI │ 2003 │ 0.258917 │
│ A │ AI │ 2004 │ 0.511275 │
│ A │ AI │ 2005 │ 0.404934 │
│ A │ AI │ 2006 │ 0.783799 │
│ A │ AI │ 2007 │ 0.303313 │
│ A │ AI │ 2008 │ 0.476597 │
│ A │ AI │ 2009 │ 0.583382 │
│ … │ … │ … │ … │
└─────────┴─────────┴───────┴────────────┘
Pivoting with multiple name columns
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... values_from= "production" ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ B_AI ┃ A_AI ┃ B_EI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │ float64 │
├───────┼──────────┼──────────┼──────────┤
│ 2002 │ 0.260492 │ 0.420572 │ 0.567511 │
│ 2005 │ 0.014042 │ 0.404934 │ 0.803179 │
│ 2013 │ 0.243911 │ 0.755804 │ 0.706561 │
│ 2004 │ 0.548699 │ 0.511275 │ 0.967540 │
│ 2006 │ 0.719705 │ 0.783799 │ 0.447970 │
│ 2007 │ 0.398824 │ 0.303313 │ 0.080446 │
│ 2008 │ 0.824845 │ 0.476597 │ 0.320055 │
│ 2001 │ 0.865310 │ 0.757954 │ 0.191067 │
│ 2003 │ 0.805028 │ 0.258917 │ 0.238616 │
│ 2009 │ 0.668153 │ 0.583382 │ 0.507941 │
│ … │ … │ … │ … │
└───────┴──────────┴──────────┴──────────┘
Select a subset of names. This call incurs no computation when constructing the expression.
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... names= [("A" , "AI" ), ("B" , "AI" )],
... values_from= "production" ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ A_AI ┃ B_AI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │
├───────┼──────────┼──────────┤
│ 2004 │ 0.511275 │ 0.548699 │
│ 2006 │ 0.783799 │ 0.719705 │
│ 2007 │ 0.303313 │ 0.398824 │
│ 2008 │ 0.476597 │ 0.824845 │
│ 2011 │ 0.504687 │ 0.493578 │
│ 2002 │ 0.420572 │ 0.260492 │
│ 2005 │ 0.404934 │ 0.014042 │
│ 2013 │ 0.755804 │ 0.243911 │
│ 2000 │ 0.844422 │ 0.477010 │
│ 2010 │ 0.908113 │ 0.001143 │
│ … │ … │ … │
└───────┴──────────┴──────────┘
Sort the new columns’ names
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... values_from= "production" ,
... names_sort= True ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ A_AI ┃ B_AI ┃ B_EI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │ float64 │
├───────┼──────────┼──────────┼──────────┤
│ 2002 │ 0.420572 │ 0.260492 │ 0.567511 │
│ 2005 │ 0.404934 │ 0.014042 │ 0.803179 │
│ 2013 │ 0.755804 │ 0.243911 │ 0.706561 │
│ 2001 │ 0.757954 │ 0.865310 │ 0.191067 │
│ 2003 │ 0.258917 │ 0.805028 │ 0.238616 │
│ 2009 │ 0.583382 │ 0.668153 │ 0.507941 │
│ 2012 │ 0.281838 │ 0.867603 │ 0.551267 │
│ 2004 │ 0.511275 │ 0.548699 │ 0.967540 │
│ 2006 │ 0.783799 │ 0.719705 │ 0.447970 │
│ 2007 │ 0.303313 │ 0.398824 │ 0.080446 │
│ … │ … │ … │ … │
└───────┴──────────┴──────────┴──────────┘
preview
preview(
max_rows= None ,
max_columns= None ,
max_length= None ,
max_string= None ,
max_depth= None ,
console_width= None ,
)
Return a subset as a Rich Table.
This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive
.
Parameters
max_rows
int | None
Maximum number of rows to display
None
max_columns
int | None
Maximum number of columns to display
None
max_length
int | None
Maximum length for pretty-printed arrays and maps
None
max_string
int | None
Maximum length for pretty-printed strings
None
max_depth
int | None
Maximum depth for nested data types
None
console_width
int | float | None
Width of the console in characters. If not specified, the width will be inferred from the console.
None
Examples
>>> import xorq as xo
>>> t = xo.examples.penguins.fetch(deferred= False )
Because the console_width is too small, only 2 columns are shown even though we specified up to 3.
>>> t.preview(
... max_rows= 3 ,
... max_columns= 3 ,
... max_string= 8 ,
... console_width= 30 ,
... )
┏━━━━━━━━━┳━━━━━━━━━━┳━━━┓
┃ species ┃ island ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━┩
│ string │ string │ … │
├─────────┼──────────┼───┤
│ Adelie │ Torgers… │ … │
│ Adelie │ Torgers… │ … │
│ Adelie │ Torgers… │ … │
│ … │ … │ … │
└─────────┴──────────┴───┘
relabel
Deprecated in favor of Table.rename
.
relocate
relocate(* columns, before= None , after= None , ** kwargs)
Relocate columns
before or after other specified columns.
Parameters
columns
str | s .Selector
Columns to relocate. Selectors are accepted.
()
before
str | s .Selector | None
A column name or selector to insert the new columns before.
None
after
str | s .Selector | None
A column name or selector. Columns in columns
are relocated after the last column selected in after
.
None
kwargs
str
Additional column names to relocate, renaming argument values to keyword argument names.
{}
Returns
Table
A table with the columns relocated.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> import xorq.selectors as s
>>> t = xo.memtable(dict (a= [1 ], b= [1 ], c= [1 ], d= ["a" ], e= ["a" ], f= ["a" ]))
>>> t
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃ c ┃ d ┃ e ┃ f ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ string │ string │ string │
├───────┼───────┼───────┼────────┼────────┼────────┤
│ 1 │ 1 │ 1 │ a │ a │ a │
└───────┴───────┴───────┴────────┴────────┴────────┘
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ f ┃ a ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ string │ string │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a │ 1 │ 1 │ 1 │ a │ a │
└────────┴───────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a" , after= "c" )
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ b ┃ c ┃ a ┃ d ┃ e ┃ f ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ string │ string │ string │
├───────┼───────┼───────┼────────┼────────┼────────┤
│ 1 │ 1 │ 1 │ a │ a │ a │
└───────┴───────┴───────┴────────┴────────┴────────┘
>>> t.relocate("f" , before= "b" )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ f ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │ string │ string │
├───────┼────────┼───────┼───────┼────────┼────────┤
│ 1 │ a │ 1 │ 1 │ a │ a │
└───────┴────────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a" , after= s.last())
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ b ┃ c ┃ d ┃ e ┃ f ┃ a ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ string │ string │ string │ int64 │
├───────┼───────┼────────┼────────┼────────┼───────┤
│ 1 │ 1 │ a │ a │ a │ 1 │
└───────┴───────┴────────┴────────┴────────┴───────┘
Relocate allows renaming
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ ff ┃ a ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ string │ string │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a │ 1 │ 1 │ 1 │ a │ a │
└────────┴───────┴───────┴───────┴────────┴────────┘
You can relocate based on any predicate selector, such as of_type
>>> t.relocate(s.of_type("string" ))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d ┃ e ┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a │ a │ a │ 1 │ 1 │ 1 │
└────────┴────────┴────────┴───────┴───────┴───────┘
>>> t.relocate(s.numeric(), after= s.last())
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d ┃ e ┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a │ a │ a │ 1 │ 1 │ 1 │
└────────┴────────┴────────┴───────┴───────┴───────┘
When multiple columns are selected with before
or after
, those selected columns are moved before and after the selectors
input
>>> t = xo.memtable(dict (a= [1 ], b= ["a" ], c= [1 ], d= ["a" ]))
>>> t.relocate(s.numeric(), after= s.of_type("string" ))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b ┃ d ┃ a ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │ int64 │
├────────┼────────┼───────┼───────┤
│ a │ a │ 1 │ 1 │
└────────┴────────┴───────┴───────┘
>>> t.relocate(s.numeric(), before= s.of_type("string" ))
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ c ┃ b ┃ d ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ string │ string │
├───────┼───────┼────────┼────────┤
│ 1 │ 1 │ a │ a │
└───────┴───────┴────────┴────────┘
When there are duplicate renames in a call to relocate, the last one is preserved
>>> t.relocate(e= "d" , f= "d" )
┏━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ int64 │ string │ int64 │
├────────┼───────┼────────┼───────┤
│ a │ 1 │ a │ 1 │
└────────┴───────┴────────┴───────┘
However, if there are duplicates that are not part of a rename, the order specified in the relocate call is preserved
>>> t.relocate(
... "b" ,
... s.of_type("string" ), # "b" is a string column, so the selector matches
... )
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b ┃ d ┃ a ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │ int64 │
├────────┼────────┼───────┼───────┤
│ a │ a │ 1 │ 1 │
└────────┴────────┴───────┴───────┘
rename
rename(method= None , / , ** substitutions)
Rename columns in the table.
Parameters
method
str | Callable [[str ], str | None] | Literal ['snake_case', 'ALL_CAPS'] | Mapping [str , str ] | None
An optional method for renaming columns. May be one of: - A format string to use to rename all columns, like "prefix_{name}"
. - A function from old name to new name. If the function returns None
the old name is used. - The literal strings "snake_case"
or "ALL_CAPS"
to rename all columns using a snake_case
or "ALL_CAPS"`` naming convention respectively. - A mapping from new name to old name. Existing columns not present in the mapping will passthrough with their original name. |
None| | substitutions | [str](
str) | Columns to be explicitly renamed, expressed as
new_name=old_name`keyword arguments. |
`
Returns
Table
A renamed table expression
rowid
A unique integer per row.
Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, SQLite and DuckDB’s rowid
).
For a monotonically increasing row number, see ibis.row_number
.
sample
sample(fraction, * , method= 'row' , seed= None )
Sample a fraction of rows from a table.
Sampling is by definition a random operation. Some backends support specifying a seed
for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.
In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed
set.
Parameters
fraction
float
The percentage of rows to include in the sample, expressed as a float between 0 and 1.
required
method
Literal ['row', 'block']
The sampling method to use. The default is “row”, which includes each row with a probability of fraction
. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause.
'row'
seed
int | None
An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]
). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases.
None
Returns
Table
The input table, with fraction
of rows selected.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.memtable({"x" : [1 , 2 , 3 , 4 ], "y" : ["a" , "b" , "c" , "d" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
│ 4 │ d │
└───────┴────────┘
Sample approximately half the rows, with a seed specified for reproducibility.
>>> t.sample(0.5 , seed= 1234 )
Translation to backend failed
Error message: UnsupportedOperationError('`Table.sample` with a random seed is unsupported')
Expression repr follows:
r0 := InMemoryTable
data:
PandasDataFrameProxy:
x y
0 1 a
1 2 b
2 3 c
3 4 d
Sample[r0, fraction=0.5, method='row', seed=1234]
schema
Return the Schema for this table.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t.schema()
ibis.Schema {
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm float64
body_mass_g float64
sex string
year int64
}
select
select(* exprs, ** named_exprs)
Compute a new table expression using exprs
and named_exprs
.
Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.
For backwards compatibility the keyword argument exprs
is reserved and cannot be used to name an expression. This behavior will be removed in v4.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(deferred= False )
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Simple projection
>>> t.select("island" , "bill_length_mm" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Torgersen │ 39.1 │
│ Torgersen │ 39.5 │
│ Torgersen │ 40.3 │
│ Torgersen │ NULL │
│ Torgersen │ 36.7 │
└───────────┴────────────────┘
In that simple case, you could also just use python’s indexing syntax
>>> t[["island" , "bill_length_mm" ]].head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Torgersen │ 39.1 │
│ Torgersen │ 39.5 │
│ Torgersen │ 40.3 │
│ Torgersen │ NULL │
│ Torgersen │ 36.7 │
└───────────┴────────────────┘
Projection by zero-indexed column position
>>> t.select(t[0 ], t[4 ]).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├─────────┼───────────────────┤
│ Adelie │ 181.0 │
│ Adelie │ 186.0 │
│ Adelie │ 195.0 │
│ Adelie │ NULL │
│ Adelie │ 193.0 │
└─────────┴───────────────────┘
Projection with renaming and compute in one call
>>> t.select(next_year= t.year + 1 ).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64 │
├───────────┤
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
└───────────┘
You can do the same thing with a named expression, and using the deferred API
>>> from xorq import _
>>> t.select((_.year + 1 ).name("next_year" )).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64 │
├───────────┤
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
└───────────┘
Projection with aggregation expressions
>>> t.select("island" , bill_mean= t.bill_length_mm.mean()).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island ┃ bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼───────────┤
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
└───────────┴───────────┘
Projection with a selector
>>> import xorq.selectors as s
>>> t.select(s.numeric() & ~ s.cols("year" )).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ float64 │ float64 │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│ 39.1 │ 18.7 │ 181.0 │ 3750.0 │
│ 39.5 │ 17.4 │ 186.0 │ 3800.0 │
│ 40.3 │ 18.0 │ 195.0 │ 3250.0 │
│ NULL │ NULL │ NULL │ NULL │
│ 36.7 │ 19.3 │ 193.0 │ 3450.0 │
└────────────────┴───────────────┴───────────────────┴─────────────┘
Projection + aggregation across multiple columns
>>> from xorq import _
>>> t.select(s.across(s.numeric() & ~ s.cols("year" ), _.mean())).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ float64 │ float64 │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘
sql
Run a SQL query against a table expression.
Parameters
query
str
Query string
required
dialect
str | None
Optional string indicating the dialect of query
. Defaults to the backend’s native dialect.
None
Returns
Table
An opaque table expression
Examples
>>> import xorq as xo
>>> from xorq import _
>>> xo.options.interactive = True
>>> t = xo.examples.penguins.fetch(table_name= "penguins" , deferred= False )
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼─────────────────┤
│ Biscoe │ 45.257485 │
│ Dream │ 44.167742 │
│ Torgersen │ 38.950980 │
└───────────┴─────────────────┘
Mix and match ibis expressions with SQL queries
>>> t = xo.examples.penguins.fetch(table_name= "penguins" , deferred= False )
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr = expr.mutate(
... island= _.island.lower(),
... avg_bill_length= _.avg_bill_length.round (1 ),
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼─────────────────┤
│ torgersen │ 39.0 │
│ biscoe │ 45.3 │
│ dream │ 44.2 │
└───────────┴─────────────────┘
Because ibis expressions aren’t named, they aren’t visible to subsequent .sql
calls. Use the alias
method to assign a name to an expression.
>>> expr.alias("b" ).sql("SELECT * FROM b WHERE avg_bill_length > 40" )
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────────────┤
│ biscoe │ 45.3 │
│ dream │ 44.2 │
└────────┴─────────────────┘
to_array
View a single column table as an array.
Returns
Value
A single column view of a table
to_csv
to_csv(path, * , params= None , ** kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
path
str | Path
The data source. A string or Path to the CSV file.
required
params
Mapping [ir .Scalar , Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to pyarrow.csv.CSVWriter
{}
https
required
to_json
to_json(path, * , params= None , ** kwargs)
Write the results of expr
to a NDJSON file.
This method is eager and will execute the associated expression immediately.
Parameters
path
str | Path
The data source. A string or Path to the Delta Lake table.
required
**kwargs
Any
Additional, backend-specific keyword arguments.
{}
https
required
to_pandas
Convert a table expression to a pandas DataFrame.
Parameters
kwargs
Same as keyword arguments to execute
{}
to_parquet
to_parquet(path, params= None , ** kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html for details.
Parameters
path
str | Path
A string or Path where the Parquet file will be written.
required
params
Mapping [ir .Scalar , Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to pyarrow.parquet.ParquetWriter
{}
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> penguins.to_parquet(tempfile.mktemp())
to_pyarrow
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
kwargs
Any
Keyword arguments
{}
Returns
Table
A pyarrow table holding the results of the executed expression.
to_pyarrow_batches
to_pyarrow_batches(chunk_size= 1000000 , ** kwargs)
Execute expression and return a RecordBatchReader.
This method is eager and will execute the associated expression immediately.
Parameters
chunk_size
int
Maximum number of rows in each returned record batch.
1000000
kwargs
Any
Keyword arguments
{}
try_cast
Cast the columns of a table.
If the cast fails for a row, the value is returned as NULL
or NaN
depending on backend behavior.
Parameters
schema
SchemaLike
Mapping, schema or iterable of pairs to use for casting
required
unbind
Return an expression built on UnboundTable
instead of backend-specific objects.
union
union(table, * rest, distinct= False )
Compute the set union of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only return distinct rows
False
Returns
Table
A new table containing the union of all input tables.
Examples
>>> import xorq as xo
>>> xo.options.interactive = True
>>> t1 = xo.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = xo.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
>>> t1.union(t2) # union all by default doctest: +SKIP
>>> t1.union(t2, distinct= True ).order_by("a" )
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 3 │
└───────┘
unnest
unnest(column, offset= None , keep_empty= False )
Unnest an array column
from a table.
When unnesting an existing column the newly unnested column replaces the existing column.
Parameters
column
Array column to unnest.
required
offset
str | None
Name of the resulting index column.
None
keep_empty
bool
Keep empty array values as NULL
in the output table, as well as existing NULL
values.
False
Returns
Table
Table with the array column column
unnested.
unpack
Project the struct fields of each of columns
into self
.
Existing fields are retained in the projection.
Parameters
columns
str
String column names to project into self
.
()
Returns
Table
The child table with struct fields of each of columns
projected.
value_counts
Compute a frequency table of this table’s values.
Returns
Table
Frequency table of this table’s values.
Examples
>>> import xorq as xo
>>> from xorq import examples
>>> xo.options.interactive = True
>>> t = examples.penguins.fetch()
>>> t.head()
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.year.value_counts().order_by("year" )
┏━━━━━━━┳━━━━━━━━━━━━┓
┃ year ┃ year_count ┃
┡━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ int64 │
├───────┼────────────┤
│ 2007 │ 110 │
│ 2008 │ 114 │
│ 2009 │ 120 │
└───────┴────────────┘
>>> t[["year" , "island" ]].value_counts().order_by("year" , "island" )
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ island ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼───────────┼───────────────────┤
│ 2007 │ Biscoe │ 44 │
│ 2007 │ Dream │ 46 │
│ 2007 │ Torgersen │ 20 │
│ 2008 │ Biscoe │ 64 │
│ 2008 │ Dream │ 34 │
│ 2008 │ Torgersen │ 16 │
│ 2009 │ Biscoe │ 60 │
│ 2009 │ Dream │ 44 │
│ 2009 │ Torgersen │ 16 │
└───────┴───────────┴───────────────────┘
view
Create a new table expression distinct from the current one.
Use this API for any self-referencing operations like a self-join.
visualize
visualize(
format = 'svg' ,
* ,
label_edges= False ,
verbose= False ,
node_attr= None ,
node_attr_getter= None ,
edge_attr= None ,
edge_attr_getter= None ,
)
Visualize an expression as a GraphViz graph in the browser.
Parameters
format
str
Image output format. These are specified by the graphviz
Python library.
'svg'
label_edges
bool
Show operation input names as edge labels
False
verbose
bool
Print the graphviz DOT code to stderr if
False
node_attr
Mapping [str , str ] | None
Mapping of (attribute, value)
pairs set for all nodes. Options are specified by the graphviz
Python library.
None
node_attr_getter
NodeAttributeGetter | None
Callback taking a node and returning a mapping of (attribute, value)
pairs for that node. Options are specified by the graphviz
Python library.
None
edge_attr
Mapping [str , str ] | None
Mapping of (attribute, value)
pairs set for all edges. Options are specified by the graphviz
Python library.
None
edge_attr_getter
EdgeAttributeGetter | None
Callback taking two adjacent nodes and returning a mapping of (attribute, value)
pairs for the edge between those nodes. Options are specified by the graphviz
Python library.
None
Examples
Open the visualization of an expression in default browser:
>>> import xorq as xo
>>> import xorq.vendor.ibis.expr.operations as ops
>>> left = ibis.table(dict (a= "int64" , b= "string" ), name= "left" )
>>> right = ibis.table(dict (b= "string" , c= "int64" , d= "string" ), name= "right" )
>>> expr = left.inner_join(right, "b" ).select(left.a, b= right.c, c= right.d)
>>> expr.visualize(
... format = "svg" ,
... label_edges= True ,
... node_attr= {"fontname" : "Roboto Mono" , "fontsize" : "10" },
... node_attr_getter= lambda node: isinstance (node, ops.Field) and {"shape" : "oval" },
... edge_attr= {"fontsize" : "8" },
... edge_attr_getter= lambda u, v: isinstance (u, ops.Field) and {"color" : "red" },
... ) # quartodoc: +SKIP