Python API
Core Operations
APIs for reading and returning data
read_csv | Lazily load a CSV or set of CSVs. |
read_parquet | Lazily load a parquet file or set of parquet files. |
memtable | Construct an ibis table expression from in-memory data. |
to_sql | Return the formatted SQL string for an expression. |
execute | Execute an expression against its backend if one exists. |
to_pyarrow_batches | Execute expression and return a RecordBatchReader. |
to_pyarrow | Execute expression and return results in as a pyarrow table. |
to_parquet | Write the results of executing the given expression to a parquet file. |
to_csv | Write the results of executing the given expression to a CSV file. |
to_json | Write the results of expr to a NDJSON file. |
Data Operations
Table | An immutable and lazy dataframe. |
GroupedTable | An intermediate table expression to hold grouping information. |
Value | Base class for a data generating expression having a known type. |
Scalar | |
Column | |
NumericColumn | |
IntegerColumn | |
FloatingColumn | |
StringValue | |
TimeValue | |
DateValue | |
DayOfWeek | A namespace of methods for extracting day of week information. |
TimestampValue | |
IntervalValue |
Caching
Caching Storage
ParquetStorage | Storage that caches expressions as Parquet files using a modification time strategy. |
ParquetSnapshotStorage | Storage that caches expressions as Parquet files using a snapshot invalidation strategy. |
SourceStorage | Storage that caches expressions within the source backend using a modification time strategy. |
SourceSnapshotStorage | Storage that caches expressions within the source backend using a snapshot strategy. |
Machine Learning Operations
Machine Learning Functions and Helpers
train_test_splits | Generates multiple train/test splits of an Ibis table for different test sizes. |
Step | A single step in a machine learning pipeline that wraps a scikit-learn estimator. |
Pipeline | A machine learning pipeline that chains multiple processing steps together. |
Type System
Data types and schemas
Data types | Scalar and column data types |
Schemas | Table Schemas |
UDF System
The functions for creating UDF
make_pandas_udf | Create a scalar User-Defined Function (UDF) that operates on pandas DataFrames. |
make_pandas_expr_udf | Create an expression-based scalar UDF that incorporates pre-computed values. |
pyarrow_udwf | Create a User-Defined Window Function (UDWF) using PyArrow. |
agg.pyarrow | Decorator for creating PyArrow-based aggregation functions. |
agg.pandas_df | Create a pandas DataFrame-based aggregation function. |
flight_udxf | Create a User-Defined Exchange Function (UDXF) that executes a pandas DataFrame |