Technology

Built like infrastructure. Not a notebook.

Promethean is a Rust-first platform that executes everything server-side: a feature store, a pipeline engine, a model registry, and a collaborative workspace, composed over one declarative SDK. Here's the architecture — and the engineering decisions behind it.

See the architecture For developers →

System architecture

One SDK. Four systems. Zero client compute.

You commit declarations from the SDK or the in-browser editor. The control plane compiles them, backfills history, and schedules work. The four systems share lineage and stream data to one another over Arrow Flight — all on a Rust substrate.

Why Rust

The Python you write is a declaration. The Rust we run is the platform.

The SDK is a thin PyO3 binding over a Rust core — no business logic in Python, no client-side compute. Compilation, materialization, scheduling, lineage, and data movement all run in a memory-safe, fearlessly-concurrent engine. That's how a feature store, a pipeline runner, and a registry stay correct under multi-tenant load without a garbage collector deciding the latency for you.

thin SDK, heavy core# the SDK is a thin, typed surface — all logic lives in Rust
@featureset
class Momentum(Featureset):
    ret_5m: float = Feature(expr=close / close.shift(5) - 1)

# .commit() ships the expression to the Rust engine,
# which compiles it to SurrealQL and materializes it.
client.commit(Momentum)
# nothing in this process touches pandas, a GIL, or your CPU

Engineering decisions

The parts that earn a quant's trust.

Every choice below exists to make results correct, reproducible, and fast — the table stakes for running other people's capital.

DATA

Arrow Flight tensor streaming

Training reads stream as columnar Arrow record batches over Flight — zero-copy, typed, and fast. No CSV exports, no client-side recompute, no silent dtype drift between research and production.

Hybrid BM25 + vector search

Feature and strategy discovery fuses lexical BM25 recall with dense vector similarity, so you find an asset by exact name or by intent. Built into Ember, not bolted on.

DATA

Git-style data branching

Fork a dataset, mutate features, validate point-in-time correctness, then merge. Production reads are isolated from experiments until a branch is explicitly promoted.

COMPUTE

Server-side remote execution

Pipelines compile to RayJobs on a shared, quota-managed cluster. The DAG runs remotely with per-step resource requests; the editor terminal just streams the logs back.

STORAGE

Purpose-fit data stores

SurrealDB for relational + graph lineage, Redpanda for the streaming spine, ClickHouse for columnar run metrics. Each chosen for one job, not as a single compromise database.

TRUST

Reproducibility contract

Every run locks its pipeline version, dataset versions, hyperparameters, runtime env, and seeds into one immutable record. Any historical result re-runs byte-for-byte.

SECURITY

Multi-tenancy & passkey auth

Per-org namespaces with database-level isolation, IdP-agnostic OIDC, and passkey (WebAuthn) sign-in. Org/team/user chargeback tracks every CPU-second back to an owner.

WORKSPACE

Collaborative workspaces

CM6 + CRDT collaboration with presence runs on managed Kubernetes pods with the SDK preinstalled and pyright wired up. Workspaces auto-pause when idle to keep them cheap.

AGENTS

MCP-native surfaces

ember-mcp and forge-mcp expose features, pipelines, and runs to LLM agents over the Model Context Protocol — so a copilot can discover and reason about your platform, safely.

The reproducibility contract

If you can't reproduce it, you can't trust it.

Capture

On every run, the engine snapshots the pipeline SHA, each dataset and featureset version, the resolved hyperparameters, the full runtime environment, and all random seeds.

Lock

Those facts are written into one immutable lineage record alongside the model artifact. Nothing is mutable; nothing is implicit; nothing depends on the machine that ran it.

Replay

Re-run any historical result and get the same bytes back. Diff two runs to see exactly what changed. Walk the chain back to the raw source for an audit.

Honest by construction

Built to be auditable.

Promethean is in early access. The infrastructure — feature store, pipelines, registry, sweeps, remote execution, multi-tenancy, and the collaborative editor — is what the marketplace runs on. The architecture above is the real shape of the system, not a roadmap slide.

Because the entire stack is reproducible and lineage-tracked by construction, track records on the marketplace can be verified rather than asserted. That property is the whole point: a quant's edge is only worth subscribing to if its history can be trusted.

What's shown here

Strategy names, metrics, and run identifiers shown are representative examples.

Read the architecture. Then build on it.

If institutional-grade infra behind one declarative SDK is the toolset you've wanted, join as a developer for early SDK and workspace access.

Join as a developer Back to overview