Private Inferenz

Inference, ohne den Prompt offenzulegen.

OpenAI-kompatibel. Signierte Belege. Von Grund auf keine Logs.

Inferenzdokumentation lesen Mit Experten sprechen

deine Gedanken bleiben deine

Private LLM catalog

Frontier models with private runtime.

OpenAI-compatible models with hardware-backed privacy and verification. Keep your SDK flow, change the endpoint, and copy the real call when you need it.

verschlüsselt

Z.ai: GLM 5.2

1.0M Kontext

$1.40/M input

Details prüfen

verschlüsselt

Qwen: Qwen3.6 27B

262K Kontext

$0.32/M input

Details prüfen

verschlüsselt

DeepSeek: DeepSeek V4 Flash

1.0M Kontext

$0.20/M input

Details prüfen

verschlüsselt

Qwen: Qwen3.5-122B-A10B

262K Kontext

$0.46/M input

Details prüfen

verschlüsselt

Qwen: Qwen3 32B

41K Kontext

$0.12/M input

Details prüfen

verschlüsselt

Google: Gemma 4 31B

262K Kontext

$0.15/M input

Details prüfen

verschlüsselt

Qwen: Qwen3.6 35B A3B

262K Kontext

$0.20/M input

Details prüfen

verschlüsselt

DeepSeek: DeepSeek V4 Pro

800K Kontext

$1.50/M input

Details prüfen

verschlüsselt

Phala: Gemma-4 26B-A4B Uncensored (Heretic)

66K Kontext

$0.15/M input

Details prüfen

verschlüsselt

Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)

131K Kontext

$0.30/M input

Details prüfen

verschlüsselt

MoonshotAI: Kimi K2.6

262K Kontext

$1.09/M input

Details prüfen

verschlüsselt

Z.ai: GLM 5.1

203K Kontext

$1.21/M input

Details prüfen

Model requests are routed through confidential AI providers with TEE support.

Check all

Private inference, by construction.

What you say to the model stays between your client and an attested CVM. Three primitives — encryption, TEE, no-logs — make that a property of the build, not a promise.

End-to-End Encryption

AES-GCM ciphertext on the wire, both hops
RA-TLS terminates inside the CVM, not at a load balancer
No plaintext intermediary on the host

So funktioniert es

Gehen Sie eine einzelne Anfrage Ende zu Ende durch.

Schalten Sie dstack aus, um zu sehen, welche Garantie genau entfällt.

Private Inference auf dstack

Zwei-Hop-RA-TLS in eine Flotte attestierter Modell-CVMs — verifizierbar, konstruktionsbedingt ohne Logs

aktiver Edgeinaktivneu in Schritt

Schritt 1 of 5 · scroll-aware

Schritt 1 / 5

Verifiziere den Build, bevor du ein Byte sendest

Das Client-SDK ruft den TDX-Quote jeder potenziellen CVM ab und führt dcap-qvl lokal aus — bestätigt, dass der Build mit einem No-Log-Eintrag in DstackApp.sol übereinstimmt. Die Vertrauensentscheidung erfolgt clientseitig; Phala muss sich nicht selbst beglaubigen.

With dstack: Der Nutzer hält den Trust Root, verankert in der TDX-Hardware-Signatur von Intel.

Gleiches SDK. Gleiche Endpoints. Standardmäßig vertraulich.

cURL · sofort einsetzbar

Hit inference.phala.com/v1/chat/completions with the OpenAI request shape. Receipt headers come back on every response — even from curl.

cURL

$ curl https://inference.phala.com \/v1/chat/completions \-H "Authorization: …" \-d '{"model":…}'x-receipt-id: rcpt-e0ee..x-aci-keyset-digest: sha256:3eff..

PYTHON

from openai import OpenAIc = OpenAI(base_url="…phala.com/v1",api_key=PHALA_KEY)r = c.chat.completions.create(…)

OpenAI Python SDK

`base_url="https://inference.phala.com/v1"` and you’re done. Existing code keeps working; capture x-receipt-id from the raw response when you need proof.

Ein einheitlicher Verifier

Whether the model runs on Intel TDX + H100 or AMD SEV + B300, the receipt format is identical. One verification path covers your whole TEE-LLM fleet — even when you mix providers.

UNIFIED PROOF

unified verifierall match

phalaLlama 3.1

near aiDeepSeek V3

tinfoilQwen2.5

chutesMistral

one format · any provider

OPENROUTER

openrouter · phala2026-07-26

2.8Btokens / day

Llama · open$0.40 / M

Llama · phala$0.40 / M

DeepSeek · open$0.27 / M

DeepSeek · phala$0.27 / M

Kein Aufpreis für Privatsphäre

Confidential routes through Phala on OpenRouter price the same as the open route. Privacy is no longer a procurement line item — just a header you opted into.

two-hop RA-TLS · X.509 with TDX-quote extension

tunneled · no plaintext intermediary

hop 01 · client → gateway

CN=phala-gatewayTDX-quote ext (1.3.6.1.4.1…)

hop 02 · gateway → model CVM

CN=vllm-llama-3.1-70bTDX+H100 quote ext

RA-TLSmTLSX.509tunneled

RA-TLS über zwei Hops, bis zum Modell

The first TLS hop terminates inside the dstack-gateway CVM (whose certificate carries its TDX quote). The second terminates inside the model CVM. There is no plaintext intermediary — just two confidential VMs whose X.509 certificates ARE their attestations.

response · /v1/chat/completions

200

x-receipt-idrcpt-e0eefe…x-aci-identitysha256:3def…x-aci-keyset-digestsha256:3eff…sessionupstream.verified

verify receiptmatches attestation

Signed receipt + attested session, every response

Every response carries x-receipt-id plus the gateway identity headers. Fetch the receipt, match it to a fresh gateway attestation, then follow upstream.verified.session_id when you need deeper audit evidence.

in production today · 3 live partners

Confidential inference, in production.

OpenRouter routes its enterprise tier through Phala. NEAR AI ships verifiable agent inference. OODA AI runs decentralized GPU TEE.

01enterprise · live

OpenRouter

enterprise tier · drop-in

“Drop-in OpenAI-compatible endpoint with verifiable, no-log routing. The receipt is the audit trail.”

18B+ tokens

no-log · verified routing

02web3 · live

NEAR AI

verifiable agent inference

“Verifiable agent inference for autonomous, on-chain workflows. Every model call lands on-chain with proof.”

100% receipts

on-chain verified · zk inference

03public-co · live

OODA AI

NASDAQ-listed · decentralized GPUs

“Decentralized GPUs with hardware attestation guarantees. No host root, no off-band access, no policy promises.”

12M tokens / day

TDX + H100 · hardware-attested

OpenAI-compatible

drop-in /v1 surface

TDX + H100/H200/B300

CPU + GPU TEE

5–15% overhead

vs bare-metal

No host root

compose-hash IS the policy

KI-Lösungswege

Verwenden Sie private Modelle, wenn KI mit Geheimnissen interagiert.

Der private Modell-Endpunkt ist der erste Einstiegspunkt. Dieselbe Datenschutz-Primitive lässt sich auf Agents, Daten-Workflows und Training ausweiten.

Agents

Private KI-Agenten

Agenten mit Schlüsseln, Tools, Speicher und Aktionen in einer verifizierten Laufzeit ausführen statt in einer sichtbaren Automatisierungs-Cloud.

Lösung öffnen

Training

Private Modelltrainings

Passe Modelle an proprietäre Daten an, während Datensätze, Gradients, Checkpoints und Evaluations-Traces innerhalb der Grenze bleiben.

Lösung öffnen

private training run

Observe without exposing weights.

H100 CC

dataset

sealed

fine-tune

running

eval

private

checkpoint

verified

loss curve

proof attached

attestation.json

Data

Private KI-Daten

Modelle zu sensiblen Datensätzen verschieben und freigegebene Ausgaben zurückgeben, ohne Rohdaten dem Modellbetreiber offenzulegen.

Lösung öffnen

source

EHR data

source

Customer records

source

Internal docs

TEE clean room

query without raw access

approved output

aggregate only

no row exportproof linked

Deploy private inference

OpenAI-compatible. Attested. Receipt-backed.

Drop in with the OpenAI SDK you already use. Point at inference.phala.com and capture x-receipt-id for per-response proof.

View docs Mit dem Vertrieb sprechen

01OpenAI-compatible base URL
02TDX + H100 / H200 / Blackwell
03x-receipt-id per response
04Gateway attestation + attested sessions
055–15% TEE overhead vs bare-metal

Private Ausführung. Verifizierbare Ergebnisse.

Newsletter