Technology
Built like infrastructure. Not a notebook.
Promethean is a Rust-first platform that executes everything server-side: a feature store, a pipeline engine, a model registry, and a collaborative workspace, composed over one declarative SDK. Here's the architecture — and the engineering decisions behind it.
System architecture
One SDK. Four systems. Zero client compute.
You commit declarations from the SDK or the in-browser editor. The control plane compiles them, backfills history, and schedules work. The four systems share lineage and stream data to one another over Arrow Flight — all on a Rust substrate.
Why Rust
The Python you write is a declaration. The Rust we run is the platform.
The SDK is a thin PyO3 binding over a Rust core — no business logic in Python, no client-side compute. Compilation, materialization, scheduling, lineage, and data movement all run in a memory-safe, fearlessly-concurrent engine. That's how a feature store, a pipeline runner, and a registry stay correct under multi-tenant load without a garbage collector deciding the latency for you.
thin SDK, heavy core# the SDK is a thin, typed surface — all logic lives in Rust
@featureset
class Momentum(Featureset):
ret_5m: float = Feature(expr=close / close.shift(5) - 1)
# .commit() ships the expression to the Rust engine,
# which compiles it to SurrealQL and materializes it.
client.commit(Momentum)
# nothing in this process touches pandas, a GIL, or your CPUEngineering decisions
The parts that earn a quant's trust.
Every choice below exists to make results correct, reproducible, and fast — the table stakes for running other people's capital.
Arrow Flight tensor streaming
Training reads stream as columnar Arrow record batches over Flight — zero-copy, typed, and fast. No CSV exports, no client-side recompute, no silent dtype drift between research and production.
Hybrid BM25 + vector search
Feature and strategy discovery fuses lexical BM25 recall with dense vector similarity, so you find an asset by exact name or by intent. Built into Ember, not bolted on.
Git-style data branching
Fork a dataset, mutate features, validate point-in-time correctness, then merge. Production reads are isolated from experiments until a branch is explicitly promoted.
Server-side remote execution
Pipelines compile to RayJobs on a shared, quota-managed cluster. The DAG runs remotely with per-step resource requests; the editor terminal just streams the logs back.
Purpose-fit data stores
SurrealDB for relational + graph lineage, Redpanda for the streaming spine, ClickHouse for columnar run metrics. Each chosen for one job, not as a single compromise database.
Reproducibility contract
Every run locks its pipeline version, dataset versions, hyperparameters, runtime env, and seeds into one immutable record. Any historical result re-runs byte-for-byte.
Multi-tenancy & passkey auth
Per-org namespaces with database-level isolation, IdP-agnostic OIDC, and passkey (WebAuthn) sign-in. Org/team/user chargeback tracks every CPU-second back to an owner.
Collaborative workspaces
CM6 + CRDT collaboration with presence runs on managed Kubernetes pods with the SDK preinstalled and pyright wired up. Workspaces auto-pause when idle to keep them cheap.
MCP-native surfaces
ember-mcp and forge-mcp expose features, pipelines, and runs to LLM agents over the Model Context Protocol — so a copilot can discover and reason about your platform, safely.
The reproducibility contract
If you can't reproduce it, you can't trust it.
On every run, the engine snapshots the pipeline SHA, each dataset and featureset version, the resolved hyperparameters, the full runtime environment, and all random seeds.
Those facts are written into one immutable lineage record alongside the model artifact. Nothing is mutable; nothing is implicit; nothing depends on the machine that ran it.
Re-run any historical result and get the same bytes back. Diff two runs to see exactly what changed. Walk the chain back to the raw source for an audit.
Honest by construction
Built to be auditable.
Promethean is in early access. The infrastructure — feature store, pipelines, registry, sweeps, remote execution, multi-tenancy, and the collaborative editor — is what the marketplace runs on. The architecture above is the real shape of the system, not a roadmap slide.
Because the entire stack is reproducible and lineage-tracked by construction, track records on the marketplace can be verified rather than asserted. That property is the whole point: a quant's edge is only worth subscribing to if its history can be trusted.
What's shown here
Strategy names, metrics, and run identifiers shown are representative examples.
Read the architecture. Then build on it.
If institutional-grade infra behind one declarative SDK is the toolset you've wanted, join as a developer for early SDK and workspace access.