Private AI Gateway: verified private AI across fragmented compute

Private AI Gateway turns TEE inference from endpoint-level proof into a verified network: attested E2EE keys, provider route verification, fail-closed forwarding, and signed receipts.

Private AI has a verification problem.

A model can run inside a TEE. A provider can publish an attestation report. A client can check a quote. That proves one thing: a specific workload looked correct at a specific moment.

Real model users need more.

They need to know which key their client encrypted to. They need that key tied to the workload they verified. They need to know which model route served the request. They need proof that routing, fallback, provider refresh, and model updates preserved the private path. They need receipts that security teams can inspect after the call.

Private AI Gateway is built for that layer. It turns private inference from endpoint-level proof into a routed system where encryption, provider verification, and receipts travel with the request.

TEE verification was hard to productize

Architecture: encryption, routing, provider verification, and receipts stay on one request path.

Early TEE inference demos usually prove one endpoint.

The user checks a quote, verifies measurements, confirms a public key, then sends traffic to that endpoint. This works for a single model service. It becomes fragile when a product routes across models, regions, providers, and capacity pools.

High-end users ask for stronger guarantees than a one-time attestation URL. They ask for a live binding between the verified workload and the key or channel handling the prompt. They ask which route served the request. They ask how the system behaves when a provider updates a model, rotates keys, or returns partial evidence.

The verification surface becomes a product problem:

RA-TLS gives strong channel binding and asks every client stack to handle custom TLS verification.
Out-of-band attestation with SPKI pinning fits mainstream clients such as OpenAI SDK, httpx, curl, and browser TLS stacks. The verifier binds the attested workload to the TLS public key.
App-layer E2EE gives a prompt-level boundary. The E2EE public key must be attested and tied to the workload keyset.
Receipts need to explain the actual path: request hash, selected route, upstream verification, provider-facing request, response hash, and final returned response.

Private AI Gateway packages these checks into the request path.

What E2EE means here

E2EE here means application-layer encryption tied to attestation.

The client first verifies the gateway attestation. That report exposes the workload identity and its attested keyset. The client then uses the attested E2EE public key to encrypt sensitive JSON fields before the request leaves the client.

Only the attested gateway workload receives the matching private key. Routing metadata can remain visible for policy, billing, model selection, and rate limits. Sensitive prompt content stays encrypted until it reaches the verified workload boundary.

There are two useful modes:

Attested TLS: verify the workload, bind it to the TLS SPKI, then use standard HTTPS clients.
ACI E2EE: verify the workload keyset, encrypt selected request fields to the attested E2EE key, and keep prompt encryption independent of the transport layer.

ACI E2EE is the product upgrade. The gateway can still route, bill, select models, and produce receipts while protecting the prompt at the application layer.

In code, the older endpoint-level flow looks like this:

// Older endpoint-level flow: verify a gateway/channel, then send normal JSON over HTTPS.
const verifier = new TeeVerifier({
  expectedMeasurement: process.env.EXPECTED_MEASUREMENT,
  expectedTlsSpki: process.env.EXPECTED_TLS_SPKI,
});

await verifier.verifyGateway("https://gateway.example.com/.well-known/attestation");

const client = new OpenAI({
  baseURL: "https://gateway.example.com/v1",
  apiKey: process.env.PRIVATE_AI_API_KEY,
});

await client.chat.completions.create({
  model: "llama-3.1-8b",
  messages: [{ role: "user", content: "sensitive prompt" }],
});

The ACI E2EE flow keeps the OpenAI-compatible request shape and adds an attested key step:

// ACI E2EE flow: verify the workload keyset, encrypt sensitive fields, then call the same API shape.
const report = await verifier.verifyGateway("https://gateway.example.com/.well-known/attestation");

const e2ee = await AciE2EE.fromAttestedKeyset({
  gatewayPublicKey: report.e2eePublicKey,
  workloadId: report.workloadId,
});

const encryptedMessages = await e2ee.encryptMessages([
  { role: "user", content: "sensitive prompt" },
]);

await client.chat.completions.create({
  model: "llama-3.1-8b",
  messages: encryptedMessages,
  extra_headers: {
    "x-client-pub-key": e2ee.clientPublicKey,
    "x-model-pub-key": report.e2eePublicKey,
    "x-e2ee-version": "aci-1",
    "x-e2ee-nonce": e2ee.nonce,
  },
});

The difference is where the guarantee lives. The older flow trusts the verified channel. ACI E2EE encrypts the prompt to the attested workload key, so routing and receipts stay visible while prompt content stays inside the verified boundary.

Private AI compute is scattered

Fragmented TEE endpoints become one verified private AI network.

TEE model capacity is spread across many providers and proof styles.

Some providers expose model-specific attestation. Some expose router-level verification. Some rely on GPU attestation. Some bind TLS SPKI. Some expose E2EE public keys. Some have strong model-integrity flows, such as dm-verity protected model packs and GPU-attestation-gated boot. Some have strong models and weaker proof UX.

For users, fragmentation becomes operational drag:

each model verifies differently;
each provider exposes a different evidence shape;
one route supports E2EE while another supports attested TLS;
one update can break a subset of models;
security teams inspect provider-specific evidence by hand;
developers need availability, while security wants every fallback to stay verified.

Private AI Gateway aggregates by proof properties. A route becomes usable when it has the right model, price, latency, availability, and verification state. The gateway can pre-verify providers, cache verification leases, enforce channel bindings, and fail closed when required proof is missing.

The gateway path

Proof path: attested E2EE key, verified provider route, signed receipt.

A request through Private AI Gateway follows a proof-carrying path.

The client verifies the gateway attestation.
The client encrypts sensitive fields to the attested E2EE key or pins the attested TLS SPKI.
The gateway receives the request inside the attested workload.
Middleware processes non-sensitive routing, billing, and policy metadata.
The backend selects a configured route.
Before forwarding, the backend verifies the upstream provider and enforces the right channel binding: TLS SPKI, E2EE key, provider session lease, or provider-specific evidence.
The model response returns through the gateway.
The gateway signs a receipt covering the request path.

The receipt is the artifact that makes the system auditable. It can record the request hash, middleware-forwarded body, selected route, upstream verification result, provider-facing request hash, provider response hash, and final returned response hash.

That is stronger than a single response signature. It tells the user which route handled the request and which proof was checked before the prompt moved forward.

What high-end users care about

Advanced users ask concrete questions:

Does the client encrypt to a key that lives inside the attested workload?
Does the TLS key or E2EE key rotate safely?
What happens when a provider updates a model and the expected quote fields change?
What happens when a legacy signature path uses one key and the verifier expects another?
What happens when many models need to be checked after one integration bug?
Can normal API compatibility coexist with serious proof quality?

These are product questions as much as cryptography questions. Private AI Gateway gives them one answer: one attested gateway, one route-verification backend, one receipt format, many verified upstreams.

What dstack adds

dstack is the substrate that makes the gateway trustworthy.

The gateway runs as a measured workload. dstack gives it workload identity, attestation, KMS-backed keys, compose-hash binding, and key release inside the TEE boundary. That is how the gateway can expose an attested E2EE key, sign receipts with a key from the attested keyset, and prove which workload handled the request.

Tinfoil-style systems show what strong single-provider private inference looks like: verified boot, dm-verity protected model packs, GPU attestation, and vLLM behind a clean API. Private AI Gateway takes the next step. It can sit above many private inference providers and turn different evidence formats into a route-level proof system.

That is the upgrade: scattered private compute becomes a verified network.

The product shift

Private AI should feel like a normal model API with better proof.

Developers get a familiar API surface. Security teams get signed receipts. Providers get a way to contribute verified capacity without forcing every customer to learn a custom proof format.

Private AI Gateway points to that product shape:

normal API surface;
attested gateway identity;
E2EE for sensitive fields;
verified upstream routes;
fail-closed forwarding when proof is required;
receipts that explain what happened.

This is how private AI moves from isolated TEE demos to aggregated verified compute.

Private AI Gateway: verified private AI across fragmented compute

TEE verification was hard to productize

What E2EE means here

Private AI compute is scattered

The gateway path

What high-end users care about

What dstack adds

The product shift

Recent Posts

Turbine on Phala TEE: A Case for Verifiable Trading Infrastructure

OPPO × Phala: Bringing Verifiable Trust to Cloud-Native AI Infrastructure

Running GLM-5.2 1M Context on a Single 8×H200 Node

Recent Posts

Related Posts

OPPO × Phala: Bringing Verifiable Trust to Cloud-Native AI Infrastructure

Running GLM-5.2 1M Context on a Single 8×H200 Node

GLM-5.2 on Phala: Open-Source SOTA Confidential AI

Related Posts

Recent Posts

Turbine on Phala TEE: A Case for Verifiable Trading Infrastructure

OPPO × Phala: Bringing Verifiable Trust to Cloud-Native AI Infrastructure

Running GLM-5.2 1M Context on a Single 8×H200 Node

Related Posts

OPPO × Phala: Bringing Verifiable Trust to Cloud-Native AI Infrastructure

Running GLM-5.2 1M Context on a Single 8×H200 Node

GLM-5.2 on Phala: Open-Source SOTA Confidential AI