import xorq as xo
# Create a connection
= xo.connect()
con
# Read a Parquet file
= xo.options.pins.get_path("penguins") # replace with the path to the parquet file
path = con.read_parquet(path, table_name="my_parquet_table")
table
# You can also read directly without a connection
= xo.read_parquet(path) table
Xorq I/O Operations
This guide covers the essential I/O operations in Xorq, including reading various file formats, working with dataframes, and handling PyArrow tables.
Table of Contents
Reading Files
Reading Parquet Files
The most common way to read Parquet files in Xorq is using the read_parquet()
method:
Parameters: - path
: Path to the Parquet file - table_name
: Optional name for the table in the backend
Reading CSV Files
CSV files can be read using read_csv()
method:
# Read CSV file
= xo.options.pins.get_path("iris") # replace with the path to the csv file
csv_path = con.read_csv(csv_path, table_name="iris_csv_table") table
Deferred Operations
Xorq supports deferred operations that build computation graphs which delay execution until explicitly requested:
from xorq.common.utils.defer_utils import deferred_read_parquet
# Create a deferred read operation
= deferred_read_parquet(
deferred_table
path,
con,"deferred_penguins"
)
# Execute when needed
= deferred_table.execute() result
Working with DataFrames
Registering Pandas DataFrames
You can register existing pandas DataFrames with Xorq backends:
import pandas as pd
import xorq as xo
# Create a pandas DataFrame
= pd.DataFrame({
df "id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"value": [10.5, 20.3, 30.1]
})
# Register with Xorq backend
= xo.connect()
con = con.register(df, table_name="my_dataframe")
table
# Now you can use it in Xorq operations
= table.filter(table.value > 15).execute() result
Converting Back to DataFrames
All Xorq table operations return pandas DataFrames when executed:
# Execute operations to get pandas DataFrame
= table.select(['id', 'name']).execute()
df_result print(type(df_result)) # <class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
PyArrow Tables
Creating Tables from PyArrow
Xorq can work directly with PyArrow tables:
import pyarrow as pa
import xorq as xo
# Create a PyArrow table
= pa.table({
pa_table "a": [1, 2, 3],
"b": ["x", "y", "z"]
})
# Register with backend
= xo.connect()
con = con.create_table("pa_table", pa_table)
table
# Execute to get results
= table.execute() result
Working with RecordBatchReaders
You can also work with PyArrow RecordBatchReaders:
import pyarrow as pa
# Create a RecordBatchReader
= pa.table({"x": [10, 20], "y": [True, False]})
pa_table = pa.RecordBatchReader.from_batches(
reader
pa_table.schema,
pa_table.to_batches()
)
# Create table from RecordBatchReader
= con.read_record_batches(reader, "rbr_table")
expr = expr.execute() result
Output Formats
Saving Data
Xorq supports multiple output formats:
# Save as Parquet (default)
"output.parquet")
expr.to_parquet(
# Save as CSV
"output.csv")
expr.to_csv(
# Save as JSON
"output.json") expr.to_json(
This guide covers the essential I/O operations in Xorq. For more advanced usage patterns, refer to the specific backend documentation and the Xorq examples in the project.