Metadata-Driven Data Platform

Build Data Pipelines Once. Run Them Local Or Cloud.

OpenPX is one Rust engine with a visual designer, 18 composable operators, distributed execution, and a portable .opx dataset format. Design fast on a laptop, then scale to on-prem clusters or cloud object storage — same engine, same compiler, no rewrite.

Book A Demo Explore Capabilities

18Composable operators

4UI apps in the suite

3Run modes: laptop → cloud

0Migration rewrites

Built For Real Data Teams

OpenPX is designed for organizations that need repeatable, portable pipeline execution across development, on-prem, and cloud environments while keeping one coherent architecture.

One Unified Runtime

A single Rust engine and operator model run in every edition, eliminating drift between dev, on-prem, and production.

Portable `.opx` Format

Datasets are Parquet parts plus a metadata descriptor with relative paths — the same files move across environments untouched.

Object Storage Ready

Read and write directly to S3, MinIO, Azure Blob, or GCS, with distributed workers for multi-node processing at scale.

Visual, API & CLI Control

Drive pipelines from the Designer canvas, the HTTP API, or the openpx CLI — whichever fits your team and automation.

Hybrid Deployment

Start on a laptop, stay air-gapped on-prem if you must, or move to cloud-native profiles with minimal process changes.

Deterministic By Design

No hidden shuffles and no unseeded randomness — equal keys always co-locate and output is byte-identical across worker counts.

One Engine, Built In Rust

At the core is a pull-based execution engine on Apache Arrow, with an explicit compiler, pluggable storage, and a real distributed transport. Everything the platform does is metadata-driven and inspectable before a single row moves.

Arrow-native core. Apache Arrow in memory and Parquet on disk, with a deterministic type system across every operator.
Pluggable backends. A native compute backend plus an optional Polars lazy backend; consecutive same-backend stages are fused into one plan.
Inspectable compiler. YAML jobs and Designer graphs lower to a LogicalPlan then PhysicalPlan — with schema inference, cycle detection, and per-node diagnostics.
Explicit partitioning. HASH, RANGE, ROUND_ROBIN, ENTIRE, and SAME partitioners — the compiler inserts shuffles only where contracts require them.
Distributed transport. A conductor drives gRPC workers across hosts for partition-parallel execution, verified byte-identical from 1 to N workers.
Pluggable storage. One StorageProvider trait dispatches by URI scheme — local filesystem or S3 / MinIO / Azure / GCS object stores.

At A Glance

LanguageRust

In-memory formatApache Arrow

Persisted formatParquet · .opx

ExecutionPull-based, partition-parallel

Compute backendsNative · Polars

Distributed transportgRPC (conductor + workers)

StorageLocal FS · S3 · MinIO · Azure · GCS

InterfacesDesigner · HTTP API · CLI

A Complete Operator Library

Eighteen composable stage types cover the full pipeline — from sources and transforms to joins, change-data-capture, fan-in/fan-out, and sinks. Every operator carries an explicit partition and schema contract, so behavior is predictable at design time.

Sources Read

Bring data in from files, object stores, and databases.

Parquet CSV .opx dataset S3 / object store Database

Transforms Compute

Reshape rows and columns, choosing native or Polars per stage.

Filter Transform Project Modify Aggregate Sort Sample Remove Duplicates

Joins & Lookups Combine

Multi-branch equality joins, enrichment, and CDC.

Join (inner/left/right/full) Hash · Sort-Merge · Broadcast Lookup Change Capture Change Apply

Combine & Route Fan-in / out

Merge many inputs or split a stream into many branches.

Funnel Merge (k-way) Copy Switch

Partitioning Shuffle

The one explicit barrier that moves rows across partitions.

HASH RANGE ROUND_ROBIN ENTIRE SAME

Sinks Write

Persist results, load tables, or tap a stream for preview.

Parquet / .opx writer Database writer Peek (preview tap)

Four Apps For The Full Lifecycle

A React application suite spans design, administration, operations, and governance — all wired to the same compile, run, and preview APIs.

Designer

Live

Visually build jobs on a canvas, configure stage properties, and compile against the real engine.

Drag-and-drop stage palette with live schema import
Real /compile with node-localized diagnostics
Sampled preview and an ad-hoc SQL query notebook
Shuffle Lens heat-matrix of actual row movement

Admin

Suite

Manage the platform: projects, users and roles, connections, and secrets.

Project and workspace administration
Role-based access and credential management
Connection and secret configuration

Director

Suite

Operate and monitor runs: phases, shuffles, logs, and schedules in one place.

Run detail with phase and shuffle breakdowns
Execution logs and worker assignment
Schedule and operations visibility

Quality

Suite

Govern data: profiling, quality rules, a glossary, and a lineage explorer.

Data profiling and quality rule definitions
Column-level lineage graph explorer
Business glossary and governance workflows

Governance & Observability, Built In

Because plans are metadata-driven, OpenPX can explain what a pipeline does — and what actually happened — without guesswork.

Column-Level Lineage

Track how every column is derived, propagated, or dropped through the expression graph.

Shuffle Lens

See a source-to-destination row matrix, bytes moved, and per-partition skew for every run.

Run Reports

Trace logs, phases, worker assignments, and per-stage metrics for full run transparency.

Schema Inference

Infer Parquet schemas and validate fail-closed at compile time — mismatches surface early.

Query Notebook

Run read-only ad-hoc SQL over any dataset to inspect and validate results in place.

Job Registry

Submit and track jobs and runs, backed optionally by Postgres for durable history.

Connectors & Formats

Read from and write to the sources your data already lives in.

Parquet CSV .opx datasets Amazon S3 MinIO Azure Blob Google Cloud Storage PostgreSQL MySQL SQLite Local filesystem

Choose Your Operating Model

OpenPX editions are intentionally compatible so teams can match regulatory, operational, and scale requirements without re-platforming.

OpenPX Local

Self-managed edition for developer velocity, on-prem operations, and air-gapped environments.

Disk-based .opx datasets for controlled environments
On-prem multi-node gRPC clusters over shared storage
Docker Compose stack: API, control plane, and metadata Postgres
Ideal for regulated and private infrastructure

OpenPX Cloud

Cloud-native edition for distributed execution and object-storage-centric data operations.

S3, MinIO, Azure, and GCS object storage support
Distributed conductor + worker pools over gRPC
Terraform IaC for AWS ECS Fargate (ARM64), ECR, and CloudWatch
Same .opx format — datasets interchange with Local

From Pilot To Production

OpenPX helps teams move from initial proof-of-concept to production rollout through a staged but consistent execution model.

Phase 1Design jobs and validate outcomes locally.

Phase 2Standardize workflows with shared dataset contracts.

Phase 3Adopt distributed execution as throughput grows.

Phase 4Operationalize governance, monitoring, and scale.

Plan Your OpenPX Rollout

Whether you are modernizing ETL, launching internal data products, or standardizing hybrid data operations, OpenPX can provide a single execution foundation across environments.

Request A Demo Contact Team

Build Data Pipelines Once. Run Them Local Or Cloud.

Built For Real Data Teams

One Unified Runtime

Portable .opx Format

Object Storage Ready

Visual, API & CLI Control

Hybrid Deployment

Deterministic By Design

One Engine, Built In Rust

At A Glance

A Complete Operator Library

Sources Read

Transforms Compute

Joins & Lookups Combine

Combine & Route Fan-in / out

Partitioning Shuffle

Sinks Write

Four Apps For The Full Lifecycle

Designer

Admin

Director

Quality

Governance & Observability, Built In

Column-Level Lineage

Shuffle Lens

Run Reports

Schema Inference

Query Notebook

Job Registry

Connectors & Formats

Choose Your Operating Model

OpenPX Local

OpenPX Cloud

From Pilot To Production

Plan Your OpenPX Rollout

Portable `.opx` Format