User-Defined Exchange Functions
Understanding the concept, and applications of User-Defined Exchange Functions in Xorq
What are UDXFs?
User-Defined Exchange Functions (UDXFs) are a specialized type of user-defined function in Xorq that enable distributed data processing using Apache Arrow Flight protocol. Unlike traditional UDFs that operate within a single process, UDXFs execute custom Python logic in separate processes or even remote services, making them ideal for:
- External API integrations (calling REST APIs, databases, or third-party services)
- Resource-intensive computations (ML model inference, heavy transformations)
- Microservice architectures (deploying models as standalone services)
- Process isolation (running untrusted or memory-intensive code safely)
Key Components
- Process Function: Your custom Python logic that transforms pandas DataFrames
- Schema Validation: Input/output schema specifications for type safety
- Flight Server: Hosts your processing function as a network service
- Flight Client: Transfers data to/from the server automatically
Creating UDXFs
Basic Syntax
import xorq as xo
from xorq.expr.relations import flight_udxf
# Define your processing function
@curry
def my_transform(df: pd.DataFrame, param1, param2):
# Your custom logic here
return transformed_df
# Create the UDXF
= flight_udxf(
my_udxf =my_transform(param1=value1, param2=value2),
process_df=input_schema.to_pyarrow(),
maybe_schema_in=output_schema.to_pyarrow(),
maybe_schema_out="MyTransformation"
name
)
# Apply to data
= input_expr.pipe(my_udxf) result
Schema Specifications
UDXFs require explicit schema definitions for type safety and optimization:
# Define input requirements
= xo.schema({"text": "string", "id": "int64"})
input_schema
# Define output schema
= xo.schema({
output_schema "text": "string",
"id": "int64",
"sentiment": "string"
})
# Schema validation functions
= schema_contains(input_schema) # Validates required columns
maybe_schema_in = schema_concat(output_schema) # Adds new columns maybe_schema_out