Python API
Core Operations
APIs for reading and returning data
| connect | Create a xorq backend. |
| execute | Execute an expression against its backend if one exists. |
| memtable | Construct an ibis table expression from in-memory data. |
| read_csv | Lazily load a CSV or set of CSVs. |
| read_parquet | Lazily load a parquet file or set of parquet files. |
| deferred_read_csv | Create a deferred read operation for CSV files that will execute only when needed. |
| deferred_read_parquet | Create a deferred read operation for Parquet files that will execute only when needed. |
| to_csv | Write the results of executing the given expression to a CSV file. |
| to_json | Write the results of expr to a NDJSON file. |
| to_parquet | Write the results of executing the given expression to a parquet file. |
| to_pyarrow | Execute expression and return results in as a pyarrow table. |
| to_pyarrow_batches | Execute expression and return a RecordBatchReader. |
| to_sql | Return the formatted SQL string for an expression. |
| register | |
| get_plans |
Data Operations
| Table | An immutable and lazy dataframe. |
| GroupedTable | An intermediate table expression to hold grouping information. |
| Value | Base class for a data generating expression having a known type. |
| Scalar | |
| Column | |
| NumericColumn | |
| IntegerColumn | |
| FloatingColumn | |
| StringValue | |
| TimeValue | |
| DateValue | |
| DayOfWeek | A namespace of methods for extracting day of week information. |
| TimestampValue | |
| IntervalValue |
Caching
Caching
| ParquetCache | Cache expressions as Parquet files using a snapshot invalidation strategy. |
| ParquetSnapshotCache | Cache expressions as Parquet files using a snapshot invalidation strategy. |
| SourceCache | |
| SourceSnapshotCache |
Data Operations
Table, column, and value types
| window | Create a window clause for use with window functions. |
| selectors | Convenient column selectors. |
Machine Learning Operations
Machine Learning Functions and Helpers
| train_test_splits | Generates multiple train/test splits of an Ibis table for different test sizes. |
| Step | A single step in a machine learning pipeline that wraps a scikit-learn estimator. |
| Pipeline | A machine learning pipeline that chains multiple processing steps together. |
| FittedPipeline | |
| deferred_fit_predict | |
| deferred_fit_transform | |
| calc_split_column | |
| make_quickgrove_udf | Create a UDF from a quickgrove (quickgrove) model. |
Lineage
Data lineage tracking utilities
| build_column_trees | Builds a lineage tree for each column in the expression. |
| build_tree | |
| print_tree |
Flight Operations
Apache Arrow Flight server and client operations
| FlightServer | |
| FlightUrl | |
| make_udxf |
Catalog Operations
Compute catalog management
| XorqCatalog | Xorq Catalog container. |
| Build | Build information. |
| Alias | |
| CatalogMetadata | Catalog metadata. |
Type System
Data types and schemas
| Data types | Scalar and column data types |
| Schemas | Table Schemas |
UDF System
The functions for creating UDF
| make_pandas_udf | Create a scalar User-Defined Function (UDF) that operates on pandas DataFrames. |
| make_pandas_expr_udf | Create an expression-based scalar UDF that incorporates pre-computed values. |
| pyarrow_udwf | Create a User-Defined Window Function (UDWF) using PyArrow. |
| agg.pyarrow | Decorator for creating PyArrow-based aggregation functions. |
| agg.pandas_df | Create a pandas DataFrame-based aggregation function. |
| flight_udxf | Create a User-Defined Exchange Function (UDXF) that executes a pandas DataFrame |