One Unified Runtime
A single Rust engine and operator model run in every edition, eliminating drift between dev, on-prem, and production.
OpenPX is one Rust engine with a visual designer, 18 composable operators, distributed
execution, and a portable .opx dataset format. Design fast on a laptop, then
scale to on-prem clusters or cloud object storage โ same engine, same compiler, no rewrite.
OpenPX is designed for organizations that need repeatable, portable pipeline execution across development, on-prem, and cloud environments while keeping one coherent architecture.
A single Rust engine and operator model run in every edition, eliminating drift between dev, on-prem, and production.
.opx FormatDatasets are Parquet parts plus a metadata descriptor with relative paths โ the same files move across environments untouched.
Read and write directly to S3, MinIO, Azure Blob, or GCS, with distributed workers for multi-node processing at scale.
Drive pipelines from the Designer canvas, the HTTP API, or the openpx CLI โ whichever
fits your team and automation.
Start on a laptop, stay air-gapped on-prem if you must, or move to cloud-native profiles with minimal process changes.
No hidden shuffles and no unseeded randomness โ equal keys always co-locate and output is byte-identical across worker counts.
At the core is a pull-based execution engine on Apache Arrow, with an explicit compiler, pluggable storage, and a real distributed transport. Everything the platform does is metadata-driven and inspectable before a single row moves.
Eighteen composable stage types cover the full pipeline โ from sources and transforms to joins, change-data-capture, fan-in/fan-out, and sinks. Every operator carries an explicit partition and schema contract, so behavior is predictable at design time.
Bring data in from files, object stores, and databases.
Reshape rows and columns, choosing native or Polars per stage.
Multi-branch equality joins, enrichment, and CDC.
Merge many inputs or split a stream into many branches.
The one explicit barrier that moves rows across partitions.
Persist results, load tables, or tap a stream for preview.
A React application suite spans design, administration, operations, and governance โ all wired to the same compile, run, and preview APIs.
Visually build jobs on a canvas, configure stage properties, and compile against the real engine.
/compile with node-localized diagnosticsManage the platform: projects, users and roles, connections, and secrets.
Operate and monitor runs: phases, shuffles, logs, and schedules in one place.
Govern data: profiling, quality rules, a glossary, and a lineage explorer.
Because plans are metadata-driven, OpenPX can explain what a pipeline does โ and what actually happened โ without guesswork.
Track how every column is derived, propagated, or dropped through the expression graph.
See a source-to-destination row matrix, bytes moved, and per-partition skew for every run.
Trace logs, phases, worker assignments, and per-stage metrics for full run transparency.
Infer Parquet schemas and validate fail-closed at compile time โ mismatches surface early.
Run read-only ad-hoc SQL over any dataset to inspect and validate results in place.
Submit and track jobs and runs, backed optionally by Postgres for durable history.
Read from and write to the sources your data already lives in.
OpenPX editions are intentionally compatible so teams can match regulatory, operational, and scale requirements without re-platforming.
Self-managed edition for developer velocity, on-prem operations, and air-gapped environments.
.opx datasets for controlled environmentsCloud-native edition for distributed execution and object-storage-centric data operations.
.opx format โ datasets interchange with LocalOpenPX helps teams move from initial proof-of-concept to production rollout through a staged but consistent execution model.
Whether you are modernizing ETL, launching internal data products, or standardizing hybrid data operations, OpenPX can provide a single execution foundation across environments.